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I had a feeling once about Mathematics — that I saw it all. Depth beyond depth 
was revealed to me — the Byss and Abyss. I saw — as one might see the transit 
of Venus or even the Lord Mayor’s Show — a quantity passing through infinity 
and changing its sign from plus to minus. I saw exactly why it happened and 
why the tergiversation was inevitable but it was after dinner and I let it go. 


Sir Winston Churchill (1874-1965) 
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Foreword 


I am delighted to be allowed to add a few words to this book by Julian Havil, 
who is a teacher of mathematics at the school where I was a student sixty years 
ago. I fell in love with mathematics at the school and have been a professional 
mathematician ever since. 

This book is not for professional mathematicians but rather it is aimed at 
students of mathematics, be they eager high school students or undergraduates, 
and those who teach them. It is an inspiring book that will give them an idea of 
how enchanting mathematics can be. 

Mathematics is often thought to be difficult and dull. Many people avoid it 
as much as they can and as a result much of the population is mathematically 
illiterate. This is in part due to the relative lack of importance given to numeracy 
in our culture, and to the way that the subject has been presented to students. It 
could be argued that the two most widely used approaches to teaching mathe- 
matics, at school level and beyond, have themselves contributed to this level of 
mathematical illiteracy. 

The first approach was the ‘boot-camp’ method of drill and exercise that 
prepared students well for examinations but often did not enable them to develop 
a real understanding of mathematics. It mostly failed to encourage students to 
see the beauty and enjoyment to be gained from the subject. I remember this 
style well from my school years, where we used the successful and influential 
textbooks written by our own head of mathematics, Clement Durell. 

The second approach, very much in fashion when my own children were at 
school, was called ‘New Math’ and was a reaction to the dullness and shal- 
lowness of the old way of teaching. The New Math teaching was based on the 
idea that children should learn to understand modern mathematical concepts 
before they learned to solve practical problems, hence students would learn 
about sets and relations before they had mastered multiplication and division. 
Students learned the vocabulary of modern mathematics without understanding 
the substance. After a few years of New Math, mathematical literacy declined 
precipitously. 

Is there a third approach that could be more successful? I believe there is a 
promising third way, and this book by Havil shows us where to find it. The third 
way is to use a historical approach to mathematics, teaching the practical skills 
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that students need, but in the context of the history of the time when these skills 
were first developed. 

Havil has chosen the 1 8th century as the context to be studied. This is the right 
choice. In the 1 8th century, the tricks and ideas of higher mathematics arose 
naturally out of the practical problems of the day. The sharp modern divisions of 
mathematics into pure and applied, abstract and concrete, did not yet exist. The 
presiding genius of Leonhard Euler created the language and the style in which 
mathematics has developed ever since. This book is centred on the personality 
of Euler and the ideas that he left for his successors to use and ponder. Euler’s 
ideas are simple enough to be accessible, and deep enough to give a feeling for 
the beauty of real mathematics. 

In this book, as is so often the case in mathematics, a little effort on the part of 
the reader will open a world of ideas. The book is so much more than an account 
of a few subjects within mathematics or a list of examples of Euler’s genius. 
Anyone who has the least inkling that mathematics is important, interesting and 
beautiful will find the book inspiring, and very enjoyable. 

In conclusion I say to the teachers and students who may use this book: Here 
is a cupboard full of bottles of vintage wine. Now drink! 


Freeman Dyson 
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Introduction 


The last thing one knows when writing a book is what to put first. 

Blaise Pascal (1623-1662) 


It is tempting to think that there are just three special mathematical constants: 
7T, e and i . In fact there are many, each with its own definition, each originat- 
ing in some natural way in its own area of mathematics, each given a special 
symbol and a name too. They need symbols to represent them because they are 
awkward; that is, they have no convenient, finite numeric representation and no 
patterned infinite one: the ratio of the circumference to the diameter of any circle 
is not 3.142 or it is 3.141 59. . . , which is as mysterious as (2.718 28 ... ) x 
essentially being the only function equal to its own derivative; in each case the 
trailing dots suggest the irrationality (let alone transcendence) of the numbers. 
Compared with these, writing i for 1 is a small convenience. The number, 
now universally known as Gamma, is generally accepted to be the most sig- 
nificant of the ‘constants obscura’ and as such is the fourth important special 
constant of mathematics; its symbol is the Greek letter y and the constant it 
represents is forever associated with the name of the Swiss genius, Leonhard 
Euler (1707-1783). Its value is the unprepossessing 0.577 215 6..., with its 
own trailing dots making the same suggestions about its character — but unlike 
its illustrious colleagues, so far they remain no more than suggestions. 

This book is an exploration of y and inescapably this means that it is also an 
exploration of logarithms and the harmonic series, since it is the interrelation- 
ship between them that Euler exploited to define his constant as 

/111 1 

y — lim ( 1 + -+ - + -H 1 Inn 

n— >° o\ 2 3 4 n 


where the In is the ubiquitous log to the base e, derived from the French expres- 
sion ‘logarithme naturel’ ; the harmonic series, which occupies a less publicized 
place in mathematical literature, is its discrete counterpart: 


H n = 1 + 


1 

2 



1 

n 


The mid 1970s brought with it the hand-held, microchip-centred, battery- 
powered, comparatively cheap calculator, thereby bringing to an end the role 
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of logarithms and the slide rule as calculative aids. Yet the appearance of them 
in a piece of mathematics is seldom a cause for surprise. Anyone who has 
studied calculus would see them materialize time and again, quite probably in 
the expression for the integral of some function or in their role as the inverse of 
the exponential function, with e vying with tt for constant supremacy. They can 
also arise without warning in situations that seem remote from their influence, 
and when they do so they exercise a surprising control in unexpected places — as 
we shall see: we will also see that the harmonic series, and others related to it, 
enjoy an important existence of their own. 

The book naturally separates into two parts: Chapters 1-1 1 might be describ- 
ed as ‘theory’, and the remainder as ‘practice’. 

In the ‘theory’ part we are concerned with definitions and some consequences 
of them, methods to approximate, and to some extent, with preparation for the 
remaining chapters. We start by looking at the peculiar way in which logarithms 
were initially defined, a way which reveals the immense intellectual effort that 
must have been invested to turn multiplication into addition, to utilize an idea 
from the old world that helped to usher in the new. The harmonic series, with 
its three peculiar properties, is discussed and then its specializations and gener- 
alizations, before looking more closely at that definition of y and having done 
that, and having convinced ourselves that the number actually exists, at ways of 
approximating its value, using both decimal and fractional methods. Among all 
of this we prove a barely credible result about co-prime integers and establish 
an identity (of Euler’s) that holds the key to the modern study of prime numbers. 

The later chapters, which are devoted to ‘practice’, look at some of the ways 
in which the three objects of our attention can appear in mathematics, and 
to some extent, in applications of it. Gamma’s varied roles in analysis and 
number theory are mentioned, some surprising appearances of the harmonic 
series are discussed, and three such of logarithms. The finale is really just 
another application of logarithms, but since the application is the Prime Number 
Theorem, leading to the Riemann Hypothesis (neither of which we prove!), it 
is deservedly singled out. It is inevitable that our journey reaches mathematics 
that is ‘worthy of serious consideration’, as Euler himself said of y, but none 
is more worthy than that celebrated Prime Number Theorem and that awesome 
Riemann Hypothesis; the first harnesses the wayward behaviour of the primes, 
the second adds finesse to that control by asking about the zeros of a function 
that seems to have none, but which stands alone as the greatest problem in 
mathematics today. 

How difficult is the mathematics? That of course is a subjective matter. Cer- 
tainly, we have not shied away from the use of symbols, since to do so would 
have condemned us merely to talking about mathematics rather than actually 
doing it. Yet, there are few really advanced techniques used, it is more that 
in some places simple ideas have been used in advanced ways. Mathematics 
makes a nice distinction between the usually synonymous terms ‘elementary’ 
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and ‘simple’, with ‘elementary’ taken to mean that not very much mathematical 
knowledge is needed to read the work and ‘simple’ to mean that not very much 
mathematical ability is needed to understand it. In these terms we think the 
content is often elementary but in places not so very simple. The reader should 
expect to make use of a pen and paper in many places; mathematics is not a 
spectator sport! The approach is reasonably rigorous but informal, as this is no 
textbook, it is more a context book of mathematics in which the reader is asked 
to take time out from studying the mathematics to read a little around it and 
about the mathematicians who produced it or of the times in which they lived; 
sometimes in detail but other times just a few lines and then not always, as this 
is no history of mathematics book either; it merely acknowledges that mathe- 
matics comes from mathematicians, not books, and seeks to bring a sometimes 
shadowy figure forward to share the prominence of his ideas, and to give some 
sort of feel for the way in which those ideas developed over time. 

The exception to the ‘elementary’ classification is some of the content of the 
final chapter on the Riemann Hypothesis; necessarily, this involves some com- 
plex function theory and in particular complex differentiation and integration. 
To those who have met these ideas the work should present few problems, but 
to those who have not they will look rather frightening; if so, simply ignore 
them or better still try to find out about them since they are a most glorious and 
powerful construction; a ‘crash course’ in some elements of complex function 
theory is included in Appendix D. The Riemann Hypothesis really is the greatest 
unsolved problem in mathematics, so it shouldn’t be surprising that it is neither 
‘elementary’ nor ‘simple’; if the chapter entices hunger in some to get to grips 
with Cauchy’s great invention it will have justified itself on that ground alone. 

We hope that the material will appeal to a variety of people who have a little 
probability and statistics and a good calculus course behind them, and before 
that a rigorous course in algebra, if such a thing still exists: the motivated senior 
secondary student, who may well be seeing many of the ideas for the first time, 
the college student for whom the text may put flesh on what can sometimes be 
dry bones, the teacher for whom it might be a convenient synthesis of some nice 
ideas (and maybe the makings of a talk or two), and also those who may have 
left mathematics behind and who wish to remind themselves why they used to 
find it so fascinating. The reader will judge to what extent this book achieves 
its aim: to explain interesting mathematics interestingly. 

The names of many mathematicians appear, names that should bring wonder 
to anyone interested in the subject and its history, but it is that name Euler that 
will force itself onto the page more than any other. It is not that we happen to 
pass through the mathematical territory to which he holds title, but more that 
it would be difficult, if not impossible, to go far in any mathematical direction 
without feeling his influence. For example, much of the notation that we now 
take for granted originates from him; in particular, e , i, f (x), A, sin.r, cos x, 
etc., as well as the standard manner of labelling a triangle, with the vertex the 
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capital letter corresponding to the opposite side’s small letter. It can be hard 
to appreciate, or easy to forget, just how many important ideas his name is 
associated with or perhaps even attached to; he invented many vastly important 
concepts and touched every known area of the subject — and everything he 
touched he adorned. According to R. Calinger, ‘Euler’s books and memoirs, 
of which 873 have so far been listed, comprise approximately a third of the 
entire corpus of research on mathematics and mechanics, both rational and 
engineering, published from 1726 to 1800.’ The Opera Omnia, his collected 
works, has reached 74 volumes of 300 to 600 pages each; the final part has still 
to be finished and will comprise at least another seven volumes. Looking up 
‘Euler’ in the index of a mathematics or history of mathematics book can be a 
frustrating experience, as the eye is routinely confronted with a block of page 
references, sometimes unspecified, at other times separated into a list, which 
might begin, 

Euler angles, Euler triangle, Euler characteristic, Euler’s identity, 

Euler circle, Euler circuit, Euler-Mascheroni constant, Euler line, 

Euler numbers, Euler’s first integral, Euler’s second integral, Euler 
polynomials, Euler’s Totient function, etc., 

and continue for dozens more entries. 

And perhaps all that was needed was to know how to pronounce his name: 
‘Oiler’. 

The noun ‘genius’ has been defined as ‘exalted intellectual power, instinctive 
and extraordinarily imaginative and creative capacity’. Extravagant use of the 
word serves only to dilute its meaning or to bring into question the judgement of 
the author, but we have used it already and will risk employing it on a number of 
other occasions, no more fittingly than with Euler, safe in the conviction that if 
he was not a genius and these people were not geniuses then none have yet been 
born. Yet, to the majority, his name is probably as mysterious as his constant. He 
breathed life into y through his Zeta functions (the generalizations of //„), the 
summation of one of which was to become a long-standing problem — described 
as ‘the despair of analysts’ — until Euler’s outrageous solution put an end to it. 

With Euler and with those who preceded him and to some extent those who 
followed him we will deal with times remote from the modern years of ‘publish 
or perish’ and in consequence primacy over an initiative is often far from easy to 
establish; it might depend on a note to a contemporary or a recorded comment 
more often than an article in a learned journal, and even then that article might 
appear years after the actual breakthrough (the controversy surrounding the 
discovery of the calculus by Newton and Leibniz stands as an infamous example 
of the problems that can arise). We hope that the reader will understand if the 
story is not always complete, and agree that where it is not complete it is at 
least representative. 
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Dr Urs Burckhardt, President of the Euler Commission, has written, ‘Indeed, 
through his books, which are consistently characterized by the highest striving 
for clarity and simplicity and which represent the first actual textbooks in the 
modern sense, Euler became the premier teacher of Europe not only of his time 
but well into the 19th century.’ Euler, as ever, provides a target too distant to 
reach, or even clearly to see, yet the pleasures (and frustrations) of achieving 
a fresh understanding of old ideas and realizations of new ones has proved 
marvellously invigorating and has brought with it the reminder that the best 
way of learning is by teaching, whether it be by the spoken or written word. We 
hope that the reader will share our enthusiasm as we take brief excursions though 
countries, centuries, lives and works, unfolding the stories of some remarkable 
mathematics from some remarkable mathematicians. 
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CHAPTER ONE 


The Logarithmic Cradle 


The use of this book is quite large, my dear friend, 

No matter how modest it looks, 

You study it carefully and find that it gives 
As much as a thousand big books. 

John Napier (1550-1617) 


1.1 A Mathematical N ightmare — and an Awakening 

In an age when a ‘computer’ is taken to mean a machine rather than a per- 
son and calculations of fantastic complexity are routine and executed at light- 
ning speed, constricting difficulties with ordinary arithmetic seem (and are) 
extremely remote. The technological freeing of mathematics from the mana- 
cles of calculation is very easy to take for granted, although the freedom has 
been newly won; as recently as the mid 1970s, a mechanical calculator, slide 
rule or table of logarithms would have been used to perform anything other than 
the most basic calculations — and the user would have been grateful for them. 
In the early 17th century none of these aids existed, although it was a period of 
massive scientific advance in many fields, progress that was increasingly and 
frustratingly hampered by the overwhelming difficulties of elementary arith- 
metic. Addition and subtraction were quite manageable, but how could the 
much more difficult tasks of multiplication and division be simplified, let alone 
the important but formidably challenging processes of root extraction? 

Ancient civilizations had tackled the problem. For example, the Babylonians 
were known to have used the equivalent of ab — ^((a + b ) 2 — (a — b ) 2 ), 
which, with a table of squares, provides some calculative help. The 16th century 
brought with it more sophisticated ideas, particularly one using the unlikely 
device of trigonometric identities, the brainchild of two Dutch mathematicians 
named Wittich and Clavius. Various relationships between the trigonometric 
definitions were appearing throughout Europe and, for example, Francois Vieta 
(1540-1603) is known to have derived (among others) the formula 

sinx cos y = j(sin(x + y) + sin(x — y)), (1.1) 
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Figure 1.1. The medieval view of the trigonometric functions. 

where the sine of an angle meant the length of the semi-chord of a circle, as in 
Figure 1 . 1 ; it therefore depended on the radius of the defining circle. In spite of 
the difficulties, extensive tables of the trigonometric functions were available, 
accurate to 12 or more decimal places (although written as integers by choosing 
a large whole number for the radius of the circle), their painstaking compila- 
tion motivated by practical problems in navigation, calendar construction and 
astronomy — and with the ingenuity of Wittich and Clavius they were set to 
other work. 

The identity (1.1), with a set of trigonometric tables and scaling, could be 
used to convert multiplication to addition and subtraction (and division by 2); 
a technique known as ‘prosthaphaeresis’ (from the Greek for addition and sub- 
traction). Division could be managed in much the same way, using identi- 
ties for secants and cosecants. This slender aid found use wherever it became 
known and nowhere more effectively than in the astronomical observatories 
of Europe, none more prestigious than Uraniborg (Castle in the Sky), on the 
island of Hven, where the Swedish-Danish Astronomer Royal, Tycho Brahe 
(1546-1601), lived and worked. And here appears a romantic story, bringing 
about a delicious serendipity. In 1590, James VI of Scotland (later to become 
James I of England) sailed to Denmark to meet his prospective wife (Anne of 
Denmark) and was accompanied by his physician, a Dr John Craig. Appalling 
weather conditions had forced the party to land on Hven, near to Brahe’s obser- 
vatory, and quite naturally the great astronomer entertained the distinguished 
party until the weather cleared, partly by demonstrating to them the process 
of prosthaphaeresis. Dr Craig was Scottish and he had a particular friend who 
lived near Edinburgh: one John Napier. 

John Napier, Baron Merchiston, believed that the world would end between 
1688 and 1700, and published his belief in a 1593 polemic on Catholicism 
entitled A Plaine Discovery of the Whole Revelation of St. John', its main thesis 
was that the Pope was the Antichrist. Since the book ran to 21 editions (10 in 
his own lifetime), he had some justification in believing that this would be his 
greatest claim to posterity (such as there was to be of it); of course, he was 
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wrong (on both counts) and it is for his Table (or Canon) of Logarithms and 
the two explanations of them (the 1614 Descriptio and the 1619 posthumously 
published Constructio ) that he is best remembered. A massively committed 
Protestant, but no ‘crank’, he found time from his contributions to the religious 
and political ferment of the day to efficiently manage his considerable estates, 
present prophetic (and surprisingly accurate) ideas for machines of war (what 
we would call the machine gun, the tank and the submarine) and, of course, to 
study mathematics. The private manuscript ‘De Arte Logistica’ (which was not 
published until 1839) provides an insight into his mathematical interests, which 
included a study of equations (and even consideration of imaginary numbers) 
and general methods for the extraction of nth roots. 

The relationship between arithmetic and geometric behaviour, which we 
would now write as a n x a m — a" +m , had been understood since antiquity; it 
is seen (for m and n positive integers) on Babylonian tablets and also in The 
Sandreckoner of the great Archimedes of Syracuse (278-212 b.c.), which we 
will mention again later on p. 93. In this treatise, which was dedicated to his 
relative, King Gelon of Syracuse, he constructed a systematic method for rep- 
resenting arbitrarily large numbers, using the number of grains of sand in the 
known universe as a tangibly large number; the work provides the first hint of 
the nature of logarithms. In its own way the identity also converts multiplication 
to addition; now, through his friend, Napier knew that with ingenuity more cal- 
culative aid was possible and, setting aside his study of arithmetic and algebra, 
he sought to improve the lot of scientists of his day, and in effect using this 
property of exponents. Twenty years later he had succeeded. In his own words, 
from the preface to the Descriptio : 

Seeing there is nothing (right well-beloved Students in the Math- 
ematics) that is so troublesome to Mathematicall practise, nor that 
doth more molest and hinder Calculators, than the Multiplications, 
Divisions, square and cubical Extractions of great numbers, which 
besides the tedious expense of time are for the most parte subject 
to many slippery errors. I began therefore to consider in my minde 
by what certaine and ready Art I might remove those hindrances. 

And having thought upon many things to this purpose, I found 
at length some excellent briefe rules to be treated of (perhaps) 
hereafter. Blit amongst all, none more profitable than this which 
together with the hard and tedious Multiplications, Divisions, and 
Extractions of rootes, doth also cast away from the worke it selfe, 
even the very numbers themselves that are to be multiplied, divided 
and resolved into rootes, and putteth other numbers in their place 
which performe as much as they can do, onely by Addition and 
Subtraction, Division by two or Division by three. . . 
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1.2 The Baron’s Wonderful Canon 

We will tread some of Napier’s path by annotating a small part of the second 
publication, The Construction of the Wonderful Canon of Logarithms, usually 
abbreviated to the Constructio. 

It begins with 60 numbered paragraphs that combine to explain his approach, 
provide a limited table of logarithms, and give instruction on how to make more 
extensive ones. 

(1) A Logarithmic Table is a small table by the use of which we can obtain 
a knowledge of all geometrical dimensions and motions in space, by a 
very easy calculation. 

His first sentence suggests Napier’s interest in the practical applications of log- 
arithms, and perhaps most particularly their usefulness to astronomers such as 
Tycho Brahe, with whom he corresponded during their development. Although 
his invented word ‘logarithm’ (a compound from the Greek words meaning 
ratio and number) appeared in the title, he used ‘artificial number’ in the body 
of the text and ‘Logarithmic Table’ was written ‘Tabula Artificialis’. 

It is deservedly called very small, because it does not exceed in size a 
table of sines; very easy, because by it all multiplications, divisions, and 
the more difficult extractions of roots are avoided; for by only a very few 
most easy additions, subtractions and divisions by two, it measures quite 
generally all figures and motions. 

The ‘modesty’ of the volume is referred to and he makes clear the arithmetic 
advantages of using logarithms, but refers only to square roots here. 

It is picked out of numbers progressing in continuous proportion. 

A hint as to the method. 

(2) Of continuous progressions, an arithmetical is one which proceeds by 
equal intervals; a geometrical, one which advances by unequal and pro- 
portionally increasing or decreasing intervals. . . 

His definition of arithmetic and geometric progressions, after which he lists 
several examples. 

(3) In these progressions we require accuracy and ease in working. Accuracy 
is obtained by taking large numbers for a basis; but large numbers are 
most easily made from small by adding ciphers. Thus, instead of 100 000, 
which the less experienced make the greatest sine, the more learned put 
1 0 000 000, whereby the difference of all sines is better expressed. Where- 
fore also we use the same for radius and for the greatest of our geometrical 
proportionals. 
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A cipher is a zero and attaching them to the right-hand side of a number does 
indeed increase the size of the number. To avoid the use of fractions, it was cus- 
tomary to ‘change the units’ (rather like using millimetres rather than metres). 
The ‘greatest sine’ is the radius of the circle, achieved when a = 90° in Fig- 
ure 1.1, and Napier chooses to represent this as 10 7 units, rather than a mere 
10 5 . 

(4) In computing tables, these large numbers may again be made still larger 
by placing a period after the number and adding ciphers. Thus in com- 
mencing to compute 10 000000 we put 10000 000.000000 0, lest the 
most minute error should become very large by frequent multiplication. 

Here he acknowledges the dangers of compounding rounding errors and intro- 
duces the use of the decimal point to help cope with them. His idea is that, even 
though the final logarithm will be rounded off to an integer, the intermediate 
calculations should involve as much accuracy as possible. 

Extending the Hindu place-value number system to include decimals had 
been one of the conceptual and notational difficulties of mathematics, and one 
of its most important developments. In 1530 one Christof Rudolff (1499-1545) 
used a form of decimal fractions in a published collection of arithmetic exam- 
ples; he also brought to the mathematical world the radical sign for the square 
root. It was, though, the multi-faceted Dutch scientist, Simon Stevin (1548— 
1620), who is accepted to have championed the use of decimal places more 
than anyone before him, since in 1585 he produced the first known systematic 
presentation of the rules for manipulating them in the treatise De Thiencle. His 
ideas soon reached a far greater audience when the book was quickly translated 
from Dutch to French to become La Disme, which has the subtitle ‘Teaching 
how all computations that are met in business may be performed by integers 
alone without the aid of fractions’ . More of a pamphlet than a book, there is a 
resonance with Napier’s thoughts in the quite splendid introduction. 

To astrologers, surveyors, measurers of tapestry, gaugers, stere- 
ometers in general, mintmasters and to all merchants, Simon Stevin 
sends greeting. 

A person who contrasts the small size of this book with your great- 
ness, my most honourable sirs to whom it is dedicated, will think 
my idea absurd, especially if he imagines that the size of this vol- 
ume bears the same ratio to human ignorance that its usefulness 
has to men of your outstanding ability; but in so doing he will have 
compared the extreme terms of the proportion which may not be 
done. Let him rather compare the third term with the fourth. 

What is it here that is being propounded? Some wonderful inven- 
tion? Hardly that, but a thing so simple that it scarce deserves the 
name invention; for it is as if some stupid country lout chanced 
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A P, P 2 P f . P, + 1 B 

Figure 1.2. 

upon great treasure without using any skill in the finding. If any 
one thinks that, in expounding the usefulness of decimal numbers, 

I am boasting of my cleverness in devising them, he shows with- 
out doubt that he has neither the judgement nor the intelligence to 
distinguish simple things from difficult, or else that he is jealous of 
a thing that is for the common good. However this may be, I shall 
not fail to mention the usefulness of these numbers, even in the 
face of this man’s empty calumny. But, just as the mariner who has 
found by chance an unknown isle, may declare all its riches to the 
king, as, for instance, its having beautiful fruits, pleasant plains, 
precious minerals etc., without its being imputed to him as deceit; 
so may I speak freely of the great usefulness of this invention, 
a usefulness greater than I think any of you anticipates, without 
constantly priding myself on my achievements. 

His notation varied from very to reasonably cumbersome, for example, 3 O 1 O 
4 O 2 0 , 3 / 142 and 3—. N apier was not consistent with his own notation but his 
use of the decimal point in the Constructio was to bring about a standardization, 
at least to some extent; even today, the Americans would usually write 3.142, 
the Europeans 3,142 and the English 3-142. Certainly, decimal is far superior to 
fractional notation when comparing sizes — and composing tables — and it was 
Napier’s tables of logarithms that did most to popularize this crucial initiative. 

(5) In numbers distinguished thus by a period in their midst, whatever is 
written after the period is a fraction, the denominator of which is unity 
with as many ciphers after it as there figures after the period. 

Thus 10000000.04 is the same as 10000000-^... 

The original Descriptio did not include explicit use of decimals. He continues 
to give several examples of the meaning of decimal notation. 

The next paragraph to interest us is 

(25) Whence a geometrically moving point approaching a fixed one has its 
velocities proportionate to its distances from the fixed one. . . 

A lengthy rhetoric follows, referring to the equivalent of Figure 1.2, to establish 
that if a point P starts at A and moves continuously towards B in such a way 
that BP, :BP ) + i is constant (and therefore moving ‘geometrically’), then that 
constant is the ratio of the point’s velocities at P f . and P r+ 1 : that is, V r :V r+ \ — 
BP, :BP ) + i. 
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O Q, Q 2 Q 3 Q, 

Figure 1.3. 

To establish this, Napier considered the motion of P over equal time intervals 
of length t and implicitly approximated the varying speed over each interval 
by its value at its starting point, as we might do in step-by-step solutions of 
differential equations. In modern notation, suppose that at some stage P is at 
position P r and that at some fixed time t later it is at P, + 1 , then BP r = BP, +1 + 
P,-P,-_i_ 1 = BP, + i + V r t, using the above approximation. Since BP, + | :BP, = k, 
BP,- = ABP r + V, t and V,- = ( 1 / 1 )(\ — &)BP,.. Of course, this means that 
V,.+i = (l/f)(l — A: )BP,-_|_i and so V r +i:V r = BP,-+i:BP r , as required. In a 
sense he was, of course, on subtle mathematical ground here, with the hint of 
instantaneous velocity, a concept that was to be dealt with by Newton seventy 
years in the future. 

(26) The logarithm of a given sine is that number which has increased arith- 
metically with the same velocity throughout as that with which radius 
began to decrease geometrically, and in the same time as radius has 
decreased to the given sine. 

This crucial paragraph defines his version of logarithm. Firstly, referring to 
Figure 1.2, AB is taken to be the ‘radius’ of length 10 7 and the possible values 
of sin a are represented by distances along the line from B, with the whole 10 7 
at A and 0 at B. The point P starts at A and moves towards B with a speed 
numerically equal to its distance from B, which means that its initial speed is 
10 7 and its final speed 0 (although this is impossible to achieve). The key to the 
whole matter is his introduction of a second, infinite line to represent the motion 
of another point Q, starting at the same time as P from an origin O but moving 
continuously with a constant velocity of 10 7 (see Figure 1.3). He defines a set of 
points Q r along this second line by the following: Q,- is the point reached by Q 
just as P reaches P r ; since the time intervals are equal and Q moves at constant 
speed, the intervals between the Q,. will all be equal and its motion ‘arithmetic’. 
The OQ, are defined to be the logarithm of the corresponding BP, , which we 
will write as OQ r = NapLog(BP, ). 

If we start to construct his table of logarithms, the implications of all this 
become more clear. 

In the first time interval t, P moves to Pi, where BPi = 10 7 — APi = 
10 7 — I 0 7 f — 10 7 (1 — f), approximating its speed over the interval by its initial 
speed of 10 7 . During this time, Q will have moved to Qi, where OQi = 10 7 t, 
which means that NapLog{10 7 (l — t)} = 10 7 f. Repeating this analysis for 
the next time interval gives BP 2 = 10 7 — AP 2 = 10 7 — (APi + PjP 2 ) = 
10 7 — 10 7 r — V] t = 10 7 (1 — t) — V\t. Now we use the result of the previous 
paragraph to get V\ : 1 0 7 = BPjilO 7 and therefore Vj = BPi = 10 7 (1 — f). 


7 


CHAPTER 1 


which means that BP 2 = 1 0 7 ( 1 — t) — 1 0 7 ( 1 — t)t — 10 7 (1 — I) 2 . Since 
OQ 2 = 10 7 x 2 1 = 2(10 7 r), we have that NapLog{10 7 (l - t) 2 } = 2(10 7 /). 
And so the process continues. In effect, he then takes t = 1/1 0 7 to get 


NapLog 
NapLog 
and, in general, 

NapLog 


10 '( 1 - J = NapLog(9 999 999) = 1 , 
10 '| ! - ^ 7 ) | = NapLog(9 999 998) = 2, 


r e N. 


10 1 ~ KF = r ' 


And using the fact that the motion is continuous. 


NapLog 



for any positive L . 


The last paragraph that we will consider is 

(27) Whence nothing is the logarithm of the radius. . . 

BA = 10 7 is the ‘radius’ and with P = A, Q = O, NapLog(10 7 ) = 0. 

The process can be thought of as taking powers of (1 — 1/10 7 ), a number 
close to 1 , which makes the powers close together and interpolation between 
them comparatively accurate; the factor of 10 7 eliminates the decimals. The 
Constructio continues to give methods of interpolation to fill in the gaps along 
AB, and in particular Napier notes that the geometric mean of two numbers 
corresponds to the arithmetical mean of their logarithms, which is true since if 
L\ — NapLog N\ and Li — NapLog N 2 , then 




y/N[ X N 2 




X 10 7 




j x(L!+L 2 )/2 

1_ To 7 ) 


and so 


NapLog (V Ni x N 2 ) = \ (L\ + L 2 ). 
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It is easy to see that another important observation for their construction holds: 
if Ni:N 2 — Ny.N 4 , then NapLog(lVi) — NapLog(A^ 2 ) = NapLogiA^) — 
NapLog (A 4 ). 

A small variation of the reasoning exposes the use of logarithms as a calcu- 
lative aid: 


1 \ 


L 1 


N[ X N 2 = 10 1 — —=r ) X 10 I 1 — —= ) 


10 7 / 


1 \ 


£2 


10V 


= 10'xl041-Aj 


L1+L2 


which makes 


N 1 x N 2 
10 7 


= 10 ' 1 


10 7 


L1+L2 


and so 


NapLog 


(N 1 x N 2 \ 

V 10 7 ) 


— L\ + L 2 — NapLog N\ + NapLog N 2 , 


and the familiar multiplicative law of logarithms emerges in a modified but still 
useful form, differing only in the position of the decimal point: multiplication 
had been transformed into addition. Napier noted that this ‘functional relation- 
ship’ satisfied by his logarithms allows the logarithm of any whole number to 
be calculated from knowing the logarithms of its prime factors, with primes 
making the first appearance of many in this book. As the gaps were filled, so the 
multiplication of a greater variety of numbers could be changed into their addi- 
tion and the ‘Wonderful Canon’ be seen as the momentous aid to calculation that 
it was; Napier will forever be remembered as the discoverer of logarithms. He 
had built a new bridge that connected problems of multiplication and division 
to problems of addition and subtraction; ‘prosthaphaeresis’ had come of age. 

Unfortunately, the name of the Swiss Jobst Biirgi (1552-1632) has slipped 
into obscurity, yet he had independently thought of the same idea, with a method 
differing only in detail. The most famous clockmaker of his time, a maker of 
scientific instruments and algebraic tutor to Johannes Kepler (1571-1630), he 
had published his method only in 1620, although it is clear that he was thinking 
of the ideas as early as 1588. It would take until 1707 for the birth of another 
Swiss who would leave an indelible mark on logarithms, and almost all other 
branches of mathematics: Euler. 

The Descriptio opens with the verse at the head of this chapter, which plainly 
and amusingly demonstrates Napier’s optimistic view of his invention, and 
indeed it met with immediate and considerable acclaim, convincingly summed 
up in the words of John Keill (1672-1721), Fellow of The Royal Society and 
Savilian Professor of Astronomy at Oxford: 
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The Mathematicks formerly received considerable Advantages; 
first by the Introduction of the Indian Characters, and afterwards by 
the Invention of Decimal Fractions; yet has it since reaped as least 
as much from the Invention of Logarithms, as from both the other 
two. The Use of these, every one knows, is of the greatest Extent, 
and runs through all Parts of Mathematicks. By their Means it is 
that Numbers almost infinite, and such as are otherwise impracti- 
cable, are managed with Ease and Expedition. By their assistance 
the Mariner steers his Vessel, the Geometrician investigates the 
Nature of higher Curves, the Astronomer determines the Places of 
the Stars, the Philosopher accounts for other Phenomena of Nature; 
and lastly, the Usurer computes the Interest of his Money. 

The work found the particular affections of Henry Briggs (1561-1630), who had 
become the first Professor of Geometry at Gresham College, London, in 1596; 
in 1620 he was to become the first occupant of the Savilian Chair of Geometry 
in Oxford; later we will meet the great G. H. Hardy, who held that prestigious 
post some 300 years later — and offered it as a prize! Briggs’s interest in the 
study of eclipses in particular and calculative aids in general naturally attracted 
him to Napier’s idea and in a letter dated 10 March 1615 to his friend James 
Ussher, he wrote 

. . . wholly employed about the noble invention of logarithms, then 
lately discovered. . . Napper, lord of Markinston, hath set my head 
and hands a work with his new and admirable logarithms. 1 hope 
to see him this summer, if it please God, for I never saw a book 
which pleased me better or made me more wonder. 

The meeting did take place that summer, with Briggs the guest of Napier for a 
month, and another followed in 1 6 1 6 ; N apier ’ s death in April 1617 prevented the 
planned arrangement for a third year. Over that time they discussed variations 
on the idea and came to agree on the suggestion of Napier that ‘0 should be 
made the logarithm of 1 and 100000 &c the logarithm of the radius’. The 
Constructio continues with an appendix by Briggs (it was he who undertook to 
arrange the publication of its London edition) entitled ‘On the Construction of 
another and better kind of Logarithms, namely one in which the Logarithm of 
unity is O’. That important step taken, he continued in the first paragraph with 
‘. . . and 10000000000 as the logarithm of either one tenth unity or ten times 
unity. . . ’; the final form had yet to be reached. In the end, of course, it was 
to be that the logarithm of 1 would be 0, and the logarithm of 10 would be 1, 
and so the tables of logarithms that were to be used for the next 350 years, the 
Briggsian logarithms, came into being. 

With Napier’s decline and death, it fell to Briggs to calculate the new tables 
and as early as 1 6 1 7 he had published Logarithmorum Chilias Prima, consisting 
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of the logarithms of the natural numbers from 1 to 1000. In 1624 he published 
the formidably detailed Arithmetica Logarithmica, in which he developed far 
more comprehensive tables and formulated means of calculating whole classes 
of logarithms (and of putting them to use). Of course, gaps remained and the 
calculations involved in filling them could be prohibitive; Edward Wright, a 
translator of Napier’s work, remarked that sometimes finding the logarithm of a 
number was more troublesome than performing the calculation without them! 
Briggs even suggested that the logarithms should be computed by teams of 
people, and he offered to supply specially designed paper for the purpose. 

It is interesting to note that first recorded appearance of ‘ x ’ for multiplication 
appeared in an anonymous appendix to Edward Wright’s 1618 translation of the 
Description thought to have been authored by William Oughtred (1574-1660), 
the inventor of the slide rule. 


1.3 A Touch of Kepler 

One of the most immediate and significant uses to which logarithms were put 
was, unsurprisingly, in astronomy. In 1601, on the death of the fractious Brahe, 
Kepler was promoted to take his place. Not only did he inherit his master’s 
prestigious position but also his voluminous and incredibly accurate data, which 
he used to help him conduct his ‘war with Mars’, a war that he eventually won 
and from which he extracted his first two laws of planetary motion. 

1. Planets move in ellipses, with the Sun at one focus and the other empty. 

2. The radius vector describes equal areas in equal times. 

The results relating to Mars were published in Astronomia Nova of 1609 and 
were later extended to the other planets, but his suspicion that there was a simple 
law relating the size of the orbits to the period of the planets remained just that 
for many years. In his own words: 

. . . and if you want the exact moment in time, it was conceived 
mentally on 8th March in this year one thousand six hundred and 
eighteen, but submitted to calculation in an unlucky way, and there- 
fore rejected as false, and finally returning on the 15th of May and 
adopting a new line of attack, stormed the darkness of my mind. 

So strong was the support from the combination of my labour of 
seventeen years on the observations of Brahe and the present study, 
which conspired together, that at first I believed I was dreaming, and 
assuming my conclusion among my basic premises. But it is abso- 
lutely certain and exact that the proportion between the periodic 
times of any two planets is precisely the sesquialterate proportion 
of their mean distances. . . 
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Semimajor axis (AU) 


Figure 1.4. The log-log plot revealing Kepler’s third law. 



Figure 1.5. A planet’s elliptical orbit. 


He had published the result in his 1 6 1 9 Harmonice Mundi as a late but important 
addition to the book, which was already at press when he finally discovered that 
T oc Z) 3 / 2 . Put another way, ‘the square of the period is proportional to the cube 
of the average distance of the planet from the Sun’ . How did he finally manage 
to discover the law? It is not clearly documented, but in 1616 he had read the 
Description and as a result logarithms would surely have helped him to see the 
hidden pattern. 

In modern terms, a log T -log D plot would yield the straight line in Fig- 
ure 1.4: it is all so obvious, retrospectively! 

The D is more easily realized as the length of the semimajor axis of the 
elliptical orbit. A little calculus establishes this. 

Referring to Figure 1.5, by the definition of an ellipse, x + y = 2a, and so 

n 2 jt r2n 

/ (x + y)d6= 2ad8 = Ait a. 

Jo ' Jo 
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So, 


/» 2tt /* 2tt n 2 jt 

/ xdO + y d0 = 2 

Jo Jo Jo 


x dd — Ana. 


Therefore, the average value of the distance of the planet from the Sun is 

-f 

2n Jq 


I' 271 2n a 

— / xdO— —a. 

2n Jq 


2n 


Whatever the facts with the third law, it is certain that Kepler used (and indeed 
justified and developed) logarithms for the production of the 1628 Rudolphine 
Tables of planetary positions, which itself contained a set of his own form of 
logarithms to eight-figure accuracy. In the words of Pierre Laplace (1749-1827) 
logarithms \ . .by shortening the labours, doubled the life of the astronomer’. 
A poetically formed, inaccurate, but powerfully revealing, observation. 


1.4 A Touch of Euler 


The modern eye might well judge Napier’s approach to logarithms as peculiar. 
They are defined in terms of the motions of points, there is no base and the 
logarithm of 10000000 was originally 0. All together, they seem so distant 
from what we now think of them to be, particularly as they are no longer 
used for the purpose for which they were invented: to calculate. Yet, there was 
an early suggestion of what we would consider logarithmic behaviour. Before 
1636, Pierre Fermat (1601-1665) (among others, and whom we will meet again 
in a later chapter) had shown what we would write as 


n + 1 

for all rational numbers n —1, but the expression for the area under the 
rectangular hyperbola y — I /x continued to prove elusive. The first inkling 
of the connection with logarithms seemed to appear in 1647, in Opus Geomet- 
ricum. . . , written by the Jesuit priest, Gregory St Vincent (1584-1667). The 
method of approximating areas by rectangles having an equal base was in com- 
mon currency but here St Vincent used rectangles of equal area, adjusting their 
base accordingly. 

Referring to Figure 1.6, since the areas of the first two rectangles are equal, 
yi(x2 - xi) = yi(x 3 - x 2 ) and so 



— (X 2 — x\) — — (x3 — xt): 1 = 1 and 

X] X2 X\ X2 


X2 
X 1 


*3 

*2’ 


This means that for the areas to increase arithmetically, the x -coordinates 
increase geometrically, with the strong suggestion of a logarithmic law con- 
necting the area under y = 1/x with x. In his Waste Book of 1664, the great 
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Figure 1.6. The hyperbola’s logarithmic behaviour. 


Isaac Newton (1642-1727) wrote, ‘In ye Hyperbola ye area of it bears ye same 
respect as its Asymptote which a logarithme doth its number.’ Both Newton and 
Nicholas Mercator (1620-1687) independently developed the idea, expanding 
1/(1 + x) as 1 — x + x 2 — x 3 H — • and integrating term by term to finish with 
the now standard expression log(l + x) — x — \x 2 + ^x 3 — • • • and thereby 
a much more convenient means of calculating logarithms. Logarithms were at 
once the ‘artificial’ numbers of Napier, the area under the rectangular hyperbola 
and the sum of an infinite series. 

As we pass over many other individual contributions, the chasm separating 
the past and the present was filled by Euler more than anyone else; it was he 
who saw furthest. The synthesis of the several approaches to logarithms lay 
in Euler’s definition of them, which appeared in his bestselling textbook on 
algebra Complete Introduction to Algebra of 1770: 

220 Resuming the equation a h = c, we shall begin by remarking that, in 
the doctrine of Logarithms, we assume for the root a , a certain number 
taken at pleasure, and suppose this root to preserve invariably its assumed 
value. This being laid down, we take the exponent b such that the power 
a b becomes equal to a given number c; in which case this exponent b is 
said to be the logarithm of the number c. . . 

221 We see, then, that the value of the root a being once established, the 
logarithm of any number, c, is nothing more than the exponent of that 
power of a, which is equal to c; so that c being = a b , b is the logarithm 
of the power a h . 

His wording is cumbersome by modern standards but here is the definition of 
logarithm that confronts most people when they are introduced to them today. 
This remarkable book is made more remarkable still with the realization that at 
the time of its writing Euler was virtually blind; he dictated the manuscript to 
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a servant, who was to function as his mathematical secretary. Later, he was to 
establish the idea of a function and one that approaches its modern definition, 
with y = a x a special case, and its inverse function defined as the logarith- 
mic function. Earlier, in an article of 1749 with a title that transcends language 
barriers ‘De la controverse entre Messers. Leibnitz et Bernoulli sur les loga- 
rithmes des nombres negatifs et imaginaires’, he used the series expansion for 
the natural logarithm and developed ideas of complex numbers to argue that 
the logarithm of any number is multivalued. Below is his jaw-dropping argu- 
ment, which uses his famous logarithmic limit (independently discovered by 
Edmond Halley (1656-1742), of ‘comet’ fame). Using his terminology, w is an 
‘infinitely small’ number and n an ‘infinitely large’ one, with / representing the 
logarithm. 

Since w is ‘infinitely small’, / (I + w) = w and therefore y = Z(1 + w) n = 
nw. Now let x — (1 + w) n , then 1 + w — x 1 /" and w — x 1 /" — 1, which 
means that lx — y = n(x 1 '" — 1). He then argued that there are n (complex) 
values of x 1 /" for any x and since n is an infinite number, there must be an 
infinite number of values of lx. He continued by pointing out that all but one 
of the values would involve -J — 1 , presaging one of the most subtle ideas of 
the next century’s complex function theory, the Riemann surface. This limit, 
lnx = lim„^oo n(x 1/,,! — 1), and the equally famed e x = lim„^.oo(l + x/n) n 
both appear in his two-volume classic Introductio in Analysin Infinitorum of 
1748, and in putting x = — 1 in the second expression to get 

1 = 

e n^o o y n ) 


we can begin to unravel Napier’s thoughts. 

SinceNapLogf 10 7 (1 - 1/10 7 ) L } = L,NapLog{10 7 (l - l/lO 7 ) 10 "} = 10 7 . 
Now, 10 7 may not be ‘infinity’ but it is quite big enough for (1 — 1/1 0 7 ) 1 0 to 
be very accurately approximated by 1 j e to get 


10 7 = NapLog 



NapLog 



Now, if we scale down by a factor of 10 7 , we have that NapLog(l/e) 1, 
which suggests that NapLogx might well be log^ x. 

With the use of the calculus, we can be precise. 

In Figures 1.2 and 1.3, if we write PB = x, OQ = y and the constant of 
proportionality 1 we have dx /dr = — xanddy/d t = 10 7 . The initial conditions 
are that when r = 0, x = 10 7 and v = 0. These give 

dv dy dr 10 7 

dx dr dx x 
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and so y = — 10 7 In x + c, where 0 = — 10 7 In 10 7 + c, which makes 

7 7 7 7 10 7 V 10 7 

v = - 10 7 In a + 10 7 In 10 7 = 10 7 In or -A, = In . 

7 A 10 7 A 

If we notice that In A. = log| /( , I //,, we finally have that 

y _ a 

10 7 ° gl/e 10 7 ' 

So, Napier’s logarithms really are a scaled-down version of logs, to the base 

i/e. 

1.5 Napier’s Other Ideas 

Napier’s major legacy is, then, a method of calculation that in its various forms 
has helped scientists and mathematicians over centuries to pursue their inves- 
tigations and theories, relatively free of the tedium of arithmetic: logarithms’ 
modern role is deeper still, as we shall see. He bequeathed some other inheri- 
tances too. 

Some of the most important practical geometrical problems of the time were 
involved with celestial navigation (with the Global Positioning System not even 
within the realms of science fiction) and therefore involved spherical triangles, 
with the Earth by then being acceptably round. Napier was recognized for two 
ideas connected with spherical trigonometry: a set of four identities useful for 
solving ‘oblique’ spherical triangles, given the name ‘Napier’s analogies’, and 
two ingenious rules for remembering the ten formulae used in solving right- 
angled spherical triangles. Both are in use today and we list them below. They 
use the now standard labelling for a triangle (spherical or plane) that capital 
letters represent the vertices and the corresponding small letters the side opposite 
(as we mentioned in the introduction, yet another inheritance from Euler) and 
it should be borne in mind that any side of a spherical triangle can be thought 
of as the angle it subtends at the centre of the defining circle. In this notation, 
Napier’s analogies are 

sin | (A — B) tan j (a — b) cos^(A — B) tan \(a + b) 

sin|(A + B) tan cos^(A + B) tan \c 

sin ^(<7 — b) tanj(A — B) cos \(a — b) tan^(A + 5) 
sin j (a + b) cot 7 c cos \(a + b) cot \c 

If the triangle is right-angled at A, the remaining five letters (two angles and 
three sides) can be arranged in order as points on a circle, as shown, with each 
point having two ‘adjacent’ points and two ‘opposite points’, which gives rise 
to Figure 1.7. 
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Napier’s rales are, then, 

• the sine of any point equals the product of the tangents of the adjacent 
points, 

• the sine of any point equals the product of the cosines of the opposite 
points. 

Moving around the circle gives five lots of two formulae. 

Most famous of all is his other calculating device: Napier’s bones (or rods). 
From them came the slide-rules of Oughtred, Gunter and Mannheim and, with- 
out the silicon chip, we would be using something based on them today. It seems 
certain that the idea stems from an ancient Arabic scheme for organizing, and 
therefore simplifying multiplication; the elegant gelosia or grating method. 
This (literally) romantic name is an allusion to the grid used in the method, 
which resembles a type of window lattice (or gelosia ) through which a jealous 
spouse might peer unseen. The process starts with a blank design into which the 
two numbers to be multiplied are introduced; the example shows the product 

3284 x 6751 = 22170284. 

The two numbers are written in the top and right semicircles (in bold in Fig- 
ure 1.8), the individual products of the digits are then written in the diago- 
nally split squares, forming a restricted ‘times table’; the answer appears in the 
remaining semicircles (the underlined digits in the figure), having been formed 
by adding the digits diagonally, starting at the bottom right — carrying over 
where necessary. 

His development of this idea was published in his Rabdologia of 1617 (from 
the Greek for ‘rod’ and ‘collection’), the year of his death, and they became 
extremely popular; perhaps the rods served the needs of those for whom loga- 
rithms were too abstract an idea. He devised several variations; some capable 


17 


CHAPTER 1 



of root extraction, but it is the type that deals with basic multiplication and 
division that has been most widely remembered. With these, each of the 10 
possible arithmetic rods comprises a digit at the top and the multiplication table 
for that digit permanently written below, in the same way as the gelosia ’s were 
written out each time; an 1 1th rod is simply the digits 1 to 9, written in order, 
as illustrated on the left in Figure 1.9. To multiply two numbers, represent one 
of them as a row of rods with the number forming the top row (of course, that 
means repeated digits require repeated rods); 5978 on the right in Figure 1.9. 
The index rod is then placed next to the set-up and the second number multiplied 
one digit at a time and the results added, taking account of decimal positions. 
For example, in the illustration, the digit 5 is being multiplied to give 29 890 
using the same diagonal adding as with the gelosia. 

In 1890, the French Civil Engineer, Henri Genaille, produced an elegant 
refinement which has become known as Genaille’s rods and which removed 
the need to remember the carry; they quite literally allow the user to read off 
the answer to a product of a number with a single digit with no calculation 
whatever. 

Finally, the Rabdologia contained yet another scheme for easing calculation: 
Napier’s abacus. The use of chequered (chess) boards for calculating was well 
established by Napier’s time and in his Abacus he used such a board with 
counters which take the move of a bishop or rook to perform all four arithmetic 
operations, as well as root extraction. For this he needed the ancient idea of 
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multiplying by doubling; in other words, writing numbers as powers of two. He 
would never have realized it, but in doing so he was using binary arithmetic, 
presaging the modern computer by some 350 years. 


19 


This page intentionally left blank 



CHAPTER TWO 


The Harmonic Series 


Mathematicians are like lovers. Grant a mathematician the least principle, and 
he will draw from it a consequence which you must also grant him, and from 
this consequence another. 

Bernard Le Bovier de Fontenelle (1657-1757) 


2. 1 The Principle 

On 11 July 1382, in the beautiful Norman city of Lisieux, Nicholas Oresme 
died at the age of 59; he had been the city’s bishop since 1377. Born into the 
Late Middle Ages (in Allemagne in 1323), his scholarship extended from the 
development of the French language to taxation theory and his distinguished 
career included the Deanship of Rouen and being chaplain to King Charles V of 
France, for whom he translated Aristotle’s Ethics, Politics and Economics. He 
taught the heliocentric theory of Copernicus over 100 years before Copernicus 
was born and suggested graphing equations nearly 200 years before the birth 
of Descartes; his treatise De Moneta brought him the soubriquet of the greatest 
medieval economist, but it is in his research in mathematics (it is probable that 
he was the first to use *+’ for addition and it was he who, in his Algorismus 
Proportionum , extended index notation to fractional and negative powers) and 
in particular in infinite series that our interest lies. To be exact, we are concerned 
with his work on the harmonic series and his proof of a single property of it and 
in that specialization we consciously ignore almost everything that this great 
man achieved; it is rather like remembering the inimitable Carl Frederick Gauss 
(1777-1855) for a measurement of magnetic flux. The greatest mathematician 
of them all will reappear time and again throughout the next pages, but now it 
is Oresme’s turn, but before that we establish a 

2.2 Generating Function for H n 
The definition of the harmonic series, 

A 1 , 1 1 1 

Hn — 2^- — 1 + - + -H h-, 

L — ' r 2 3 n 

r = 1 
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is equivalent to 

1 

H r — H r - 1 H — , r > 1 and H\ = 1, 

r 

which can be used to establish the generating function: 



If we make no assumption about the H r , we can multiply across by the (1 — x) 
to get 

OO 

— ln(l — x) — (1 — x) H r x r 

r= 1 

and then use Newton and Mercator’s expansion of ln(l — x) to get 

OO OO 

x + jx 2 + |x 3 H = 22 Hi-x r — "22 H/-x r+] , assuming |x| < 1. 

r= 1 r = 1 

Comparing coefficients of x r , we have 

1 

- — H r — H r - 1 for r > 1 , 
r 

1 

H r — H r _ i H — and Hi — 1, 

r 

the definition is recovered and the result is established. 

In a letter dated 15 February 1671, James Gregory (1638-1675) wrote, ‘As 
to yours, dated 24 Dec., I can hardly beleev, till I see it, that there is any general, 
compendious & geometrical method for adding an harmonical progression. . . 
To this day, we share Gregory’s disappointment, as a formula for //„ for general 
n does not exist, nice though it would be to have. The simplicity of the definition 
of the series belies its subtlety and many consequences can be drawn from it. 
Below we give three. 

2.3 Three Surprising Results 
2.3.1 Divergence 

No property is more unexpected than H n ’s divergence, and it is this that Oresme 
proved; that is, as n — > oo, H n —> oo, but so very slowly. The first 100 
terms sum to 5.187 . . . , the first 1000 to 7.486 . . . and the first 1 000 000 to 
14.392 . . . ; it is hard to believe that, for large enough n, //„ will exceed any 
chosen number, but such is the case; it would take a sensitive eye indeed to spot 
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the divergence numerically. In 1968 John W. Wrench Jr calculated the exact 
minimum number of terms needed for the series to sum past 100; that number 
is 15 092 688 622 1 13 788 323 693 563 264 538 101 449 859 497. Certainly, he 
did not add up the terms. Imagine a computer doing so and suppose that it takes 
it 10 -9 seconds to add each new term to the sum and that we set it adding and 
let it continue doing so indefinitely. The job will have been completed in not 
less than 3.5 x 10 17 ( American) billion years. 

Oresme’s celebrated proof, in modern notation, is shown below: 


1 (\ \\ (l 1 1 1\ 

//o °- 1+ 2 + (3 + 4j + (5 + 6 + 7 + 8j 

/II 1 1 1 1 1 1 \ 

+ ( v 9 + T0 + n + l2 + l3 + l4 + T5 + l6j + '" 

1 (\ 1\ (\ 1 1 1\ 

>1+ 2 + (4 + 4j + (8 + 8 + 8 + 8j 

/I 1 1 1 1 1 1 1 \ 

+ VT6 + T6 + T6 + !6 + l6 + l6 + T6 + l6j + ''' 

1 2 4 8 1 1 1 1 

- 1+ 2 + 4 + 8 + l6 + "'- 1+ 2 + 2 + 2 + 2 + '"’ 

which is, of course, divergent. 

Inevitably, such a result has many proofs and we will consider two more. 
With the pursuit of elegance as motive: 



— H 0 o, a contradiction. 


And with deference to Euler, whose part in this story (and so many others) is 
so great: 


fO „x r C 

-jC 


e x (l - e x r ] dx 


e x (l + A + e 2 ' r + e 3 - r + • • •) dx = 


=£ 


+ e 2x + c 3 - v + • • • dx 


— [? X + + • • *]— oo — 1 + J + 3 + ''' — t — — e ')]-oo' 
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which is clearly infinite when evaluated at the upper limit. There we have it, with 
its improper integral and non-legitimate binomial expansion; it can be tidied of 
course, but forcing the detail would blur its sweeping stylishness. 

2.3.2 H„ is non-integral 

The second surprise is that, even though H n increases without bound, it manages 
to avoid all integers in doing so (apart from n = 1) and, more than this, any 
consecutive subseries of H n is never an integer. That is, for positive integers 
m , n with m < n , 

111 1 

Smn — 1 — — H — r H H 

m m +1 m + 1 n 

is never an integer. 

The argument we give is delicate and a bit wordy, but the method of proof is 
to show that S mn is a fraction with an odd numerator and an even denominator, 
which ensures that it cannot be integral. To this end, we need an intermediate 
result, which itself is a little surprising at first. 

In any finite, consecutive subsequence of the sequence 1, 2, 3, ... , there is 
a unique term with a highest factor of 2. That is, if we factorize each term and 
focus on the factors that are the powers of 2, there is only one with the term 
with the highest power of 2. The following argument establishes this. 

If the sequence contains powers of 2, the term with the highest power of 2 
is the number we seek. Otherwise, the sequence is contained strictly between 
two consecutive powers of 2, say 2“ and 2“ +1 , that is 2.2“ -1 and 4.2“ , the 

highest power of 2 between them being 3.2“ _1 ; if the sequence contains this 
number, it is the number we seek, otherwise the sequence lies entirely within 
one of the two intervals, say 2.2“ _1 and 3.2“ _1 , that is, 4.2“ -2 and 6.2“ -2 , the 
highest power of 2 between them being 5.2““ 2 . The process continues until the 
sequence contains one of the key numbers or is of length 2, in which case we 
select the even number. 

Now suppose that we factorize each of the denominators of S mn into a product 
of prime factors and select the unique term whose denominator contains the 
highest power of 2; call it 1 / k. When each term of S mn is written as a fraction 
with denominator the least common multiple of all of these, 1 / k must have an 
odd numerator and the numerators of all of the others must be even, consequently 
the numerator of S mn , considered as a single fraction, must be odd; clearly the 
denominator is even and we have the result. Of course, taking m = 1 proves 
the result for H n . 

2.3.3 H n is almost always a non-terminating decimal 

We have that H\ — I , ih — 1.5 and H(, = 2.45 and of course, since H n is 
always a fraction, its decimal expansion must either be finite, as with these 
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examples, or infinitely recurring. The final surprise is that, apart from these 
three cases, all of the other 77„ are the infinitely recurring variety. Our proof of 
this remarkable fact will take us from comparatively shallow to very deep water, 
with the need of a most profound and significant result of number theory: the 
Bertrand Conjecture. In 1 845 the French mathematician Joseph Bertrand (1 822- 
1900) conjectured that for every positive integer n > 1 , there exists at least one 
prime p satisfying n < p < 2 n (having verified it for n < 3 000 000). He was 
not destined to provide a proof, but five years later the Russian mathematician 
Pafnuty Chebychev (1821-1894) was, and he was to come close to proving 
another of the great results of mathematics: the Prime Number Theorem, but 
more of that much later. 

Firstly, it is clear that any number which can be represented as a finite decimal 
can be written as a fraction with denominator a power of 10. Ten is two times 
five and so the denominator is a power of two times five, and, after possible 
cancellation, of the form 2“5^. To show that //„ is not a finite decimal, it 
is enough to show that the denominator of 77„, when it is written as a single 
fraction, contains prime factors greater than 5. Simply by writing out 7/3 , 1 14 and 
7/5 we can establish that they are infinitely recurring; now write 77, , — a n /b n , 
n ^ 7, where a n and h n are in their lowest terms. We need to show that /;„ is 
divisible by some prime p ^ 7. To that end we will prove that for all primes 
p e [i (n + 1), n], p divides b n and do so by induction on n. For n — 1 the 
interval is [4, 7], the set of primes {5, 7} and since 7/7 = ^ we are done. Now 
assume the result for n , then we need to show that for all p e [ 7 (n + 2) , n + 1 ] , 
p divides where 

a n + 1 _ On_ + 1 _ a n (n + 1) + b„ 

b n + 1 b n n + 1 b n (n + 1 ) 

Since this new interval can only add n + 1 to the list of primes and since if 
n + 1 is prime, = b n (n + 1) is incapable of cancellation with the a n +i, we 
have what we need and the result is true by induction. The Bertrand Conjecture 
guarantees that the set of intervals [p, 2 p — 1] for p ^ 7 overlap and therefore 
contain every integer n ^ 7, since it guarantees a prime between every pair p 
and 2 p. Now we have all that we need. If n ^ 7, there is a prime p ^ 7 such 
that n e [p, 2 p — 1], which means that p e [\{n + 1), n\ and so divides b n . 
The reader may wish to look at what happens with p = 5. 

Having studied the full harmonic series, we will look at some interesting 
subseries of it. 
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CHAPTER THREE 


Sub-Harmonic Series 


The mathematician requires tact and good taste at every step of his work, and 
he has to learn to trust to his own instinct to distinguish between what is really 
worthy of his efforts and what is not. 

James Glaisher (1848-1928) 


The incredibly slow divergence of H n suggests that we would not need to alter 
its terms by much to force convergence, and by altering we mean omitting or 
cancelling. In this chapter, we will attempt just that. 


3.1 A Gentle Start 


If we start taking out terms in a structured way, we might start with 


1 

2 


+ 



1 

6 






or alternatively 


1 1 

1+ 3 + 5 


1 

7 


> 1 



1 1 

8 + '”“ 1+ 2 




both of which clearly diverge, which will have implications on p. 102. 

So removing ‘half’ of the terms is not enough to force the depleted series 
to converge, nor would a third or any other fraction of it. Taking only powers 
of any single number leaves us with a convergent geometric series, but that 
really is taking out an awful lot of terms and not, in our development, very 
interesting. Is there something in between? A tantalizing possibility is to sum 
over the reciprocals of odd, perfect numbers (a perfect number is an integer 
which is equal to the sum of its proper divisors and 1; for example, 6 and 28). 
It is known that such a sum is finite; the problem is that no examples of these 
numbers are known and so our series may be entirely non-existent! 
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3.2 Harmonic Series of Primes 

Primes are forever a source of interest and their scarcity (we will see just how 
scarce they are later) makes the series taken over the reciprocals of primes a 
pretty sparse one, and their lack of pattern (and we will be looking at that later 
too) a very attractive one. 



has indeed had a great deal removed from H 0 c , but amazingly this also diverges. 
Of course, this must mean that there is an infinite number of primes, a fact 
established by Euclid in about 300 b.c. It is well worth a look at one version 
of his famous proof, as well as another entirely different, equally elegant and 
more modern argument. Firstly, Euclid. 

Suppose that there are a finite number of primes and that the biggest of them 
is N , then the considerably bigger number composed of 1 plus the product of 
all of the primes up to and including N, P — I + 2 x 3 x 5 x 7 x • • • x ;V , either 
is prime (which would contradict our assumption that N is the biggest prime) 
or it is composite, and therefore divisible by primes. All of the primes leave a 
remainder of 1 when dividing P and so there must be other primes bigger than 
N , which is again a contradiction to the assumption that the number of primes 
is finite, with N as the biggest. The only escape is that the number of primes is 
infinite, and the proof is complete. 

If P does happen to be prime it is given the appropriate name of ‘Euclidean’ 
prime. How common are these Euclidean primes? Things start off productively, 
with P a prime for N any of the first five primes 2, 3, 5, 7 and 1 1 (giving P as 
3, 7, 31, 211 and 2311, respectively); the next Euclidean prime appears when 
N = 31, to give P = 200 560490 131, and the only other example for N less 
than 1000 is N — 379, with P rather too big to list! At present the largest known 
example is with N = 24 029. Is there an infinite number? Nobody knows, but 
they do become very rare as N becomes very large. 

A modern number theorist’s proof of Euclid’s result looks and feels different. 
In 1938 the consummate practitioner Paul Erdos (1913-1996) gave the one that 
follows, which uses a counting technique and a neat device used by number 
theorists: that any integer can always be written as the product of a square 
and a square-free integer. This is clear enough if the integer is factorized into 
the product of its prime factors and the repeated ones collected together; for 
example, 2 851 875 = 3 3 x 5 4 x 1 1 x 13 2 = 3xllx(3x5 2 x 13) 2 ; of course, 
for a perfect square, the square-free part is 1. When we discuss the Riemann 
Hypothesis we will come across the Mobius function and see just how important 
it can be that an integer does or does not contain repeated factors. The proof is 
as follows. 
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Let N be any positive integer and p\, P 2 , Pi, ■ ■ ■ ■ Pn the complete set of 
primes less than or equal to N , then each of the positive integers less than or 
equal to N can, of course, be written as a product of powers of the p, and, using 
the above observation, in the form p\ l p^ P^ ■ ■ ■ Pn ' x m 2 , where e,- e {0, 1}, 
depending on whether a particular prime is present or not. Consequently, there 
are 2" ways of choosing the square-free prime factorization and clearly m 2 N N 
and so m N V7v. This means that the integers less than or equal to N can be 
chosen in at most 2" x \Z~N ways and therefore that N ^2 n x VN, which 
makes 2" ^ \/~N and n ^ \ log, N. Since N is unbounded, so must the number 
of primes be. 

The proof leaves one breathless and wondering how anyone could ever have 
thought of it, but that was part of the genius of the man. 

With the prime series now definitely infinite, we look to establishing its 
divergence. Euler (inevitably) attacked the problem and in doing so brought 
about an incredible result that spawned the whole subject of analytic number 
theory. Here we will give a proof based on Erdos’s extension of his argument. 
Suppose that the series does converge. Then there must be a tail of the series 
which sums to less than j, that is, there must exist an i such that 



Pi + 1 Pi + 2 Pi + 3 2 

Now let Nj ( x ) be the number of positive integers less than x which are divisible 
by only the first i primes. If n is one of them, as before we can write n — kx m 2 , 
where A: is a square-free number. Since there are precisely i primes that could 
divide k, k — p°[' p“ 2 p“ 3 . . . p“' , where a r e {0, 1} and so there are 2 ! possible 
values for k, depending on whether a particular prime is present or not. Clearly, 
m 2 N n < x and so m can be chosen in fewer than Jx ways, consequently, 
Nj (x) < 2' yfx. The number of positive integers less than x which are divisible 

by a prime p is at most x/ p (consider p, 2p.3p np. where up N. x and 

so n N x/ p), therefore the number of positive integers less than x which are 
divisible by any prime other than the first i primes is at most 

XXX 

1 1 )-•••, 

Pi + 1 Pi +2 Pi + 3 

which is, of course, less than lx. But by definition this is x — Nj(x), hence 
1 ^1 1 
x — Nj (x ) < and Nj(x) > ^x. Combining these two bounds we have < 

Nj (x) < 2‘ ^[x and hence ^ x < 2' ^/x, which is true only for x < 2 1,+2 . Take 

x > 2 2i+2 and we have our contradiction! And a perfectly beautiful one too. 

So, the sum of the reciprocals of the primes diverges — but how slowly? Very 

slowly. For example. 


p< 1 million 

E 

p prime 


i 

p 


2.887 289... 
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Our computer, which adds in a new term to the sum every 10 -9 seconds, would, 
after 15 (American) billion years, have summed the series to a number just over 
4. We will look at a form of Euler’s proof later and that will also provide us 
with a measure of the rate of this glacially slow divergence. 

As with the full harmonic series, even though the sum of the reciprocals of 
primes diverges, it manages to miss every integer. The proof is surprisingly 
easier than the one for the harmonic series. In fact, for any sequence of distinct 
primes pi, p 2 , ■■■ , p m , if 


1 1 1 

— + — + — + ■ 
Pi P2 P 3 


1 

Pm 


— n, 


then 


1 


1 


1 


1 


Pm P2P3P4 ■■■ Pm 
■ ■ p m and so p 2 p 2 p 4 ■ ■ ■ Pm is divis- 


P 1 P2 P3 

for some integer a, hence ap i = p 2 p 2 p 4 - 
ible by p \ , which is impossible. 

Leaving only the primes fails to force convergence. If we pursue this thread, 
the most natural next step is to leave only the twin primes, that is, consecutive 
pairs of primes; it is customary (but not universal) to ignore 2 for this purpose 
and to count 5 twice, so the pairs are (3,5), (5,7), (11,13 ), ..., ( 1019,1021), . . . , 
and these are incredibly sparse. In fact, it is not even known whether there is 
an infinite number of them and therefore if our series is infinite (this is called 
the Twin Primes Conjecture). It is interesting to note that the pair (1019,1021) 
generate two Euclidean primes. Using only twin primes, all that is left of H 0 0 
is 


1 1 

3 + 5 


^ + 0 + (ti + ^) + (^ + ^ 

Do we achieve convergence now? Finally, the answer is yes, but no one is sure 
to exactly what number; it is about 1 .902 160 582 4. . . and is known as Brun’s 
constant, after the Norwegian mathematician, Viggo Brun (1885-1978), who, 
in 1919, established the convergence. Not much is known about it, although its 
size is a strong indicator of just how sparse twin primes are. Thomas Nicely 
provided the above estimate in 1994 and in the process uncovered the infamous 
and much-publicized Intel Pentium division bug ( ‘for a mathematician to get this 
much publicity, he would normally have to shoot someone’), which made itself 
apparent with the pair of twin primes 824633 702 441 and 824 633 702443. 
His announcement to the world was by a now famous email, which began: 

It appears that there is a bug in the floating point unit (numeric 
coprocessor) of many, and perhaps all, Pentium processors. In 
short, the Pentium FPU is returning erroneous values for certain 
division operations. For example, 1/824 633 702441.0 is calcu- 
lated incorrectly (all digits beyond the eighth significant digit are 
in error). . . 
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On 17 January 1995 Intel announced a pre-tax charge of $475 million against 
earnings, as the total cost associated with the replacement of the flawed chips. 

Incidentally, it would have been very convenient had the series diverged, as 
that would have meant that there is an infinite number of twin primes and so 
resolved the Twin Primes Conjecture. (The reader may convince themselves 
that 5 is the only candidate for repetition by reasoning that any prime greater 
than 3 must be of the form 6 n ± 1 , any pair of twin primes must be 6 n — 1 and 
6 n + 1 and therefore that a consecutive sequence of three is impossible beyond 
3,5,7.) 


3.3 The Kempner Series 


The most novel culling of the terms of the harmonic series has to be due to A. J. 
Kempner, who in 1914 considered what would happen if all terms are removed 
from it which have a particular digit appearing in their denominators. For exam- 
ple, if we choose the digit 7, we would exclude the terms with denominators 
such as 7, 27, 173, 33 779, etc. There are 10 such series, each resulting from the 
removal of one of the digits 0, 1 , 2, . . . , 9, and the first question which naturally 
arises is just what percentage of the terms of the series are we removing by the 
process? For example, if we remove all terms involving 0 we are left with 


11 11 11 
1 + — -(- — + • • • + — -f- — -(-•••-(- — 4- — -(- etc. 
23 911 19 21 

11 11 

4“ — 4~ 4“ * * * 4~ T 4~ etc. - 

99 111 119 121 


1 

999 


whereas if we remove all terms including 1 we are left with 


1 

2 


4- 


1 

3 


1111 11 

— 4~ — 4“ — 4“ — 4“ * * * 4~ — 4“ — 4~ etc. 

9 20 22 23 30 32 

111 1 

4~ — 4- 4* 4“ etc. 4- 4 

99 200 202 999 


Up to a given limit, we can count exactly how many terms have been removed 
by grouping the denominators of the terms by the number of digits they have 
in them, firstly assuming that we are removing 0 (see Table 3.1). 

This means that when we have culled the denominators involving a 0 we are 
left with 


9 4- 9 2 4- 9 3 4- 9 4 4- h 9" = ^ — -p 

= | (9" — 1) terms of the 10" — 1 possible. 

If we now perform the same analysis when we remove the digit 1 instead, 
we have Table 3.2. 
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Table 3.1. Removing the digit 0. 


Denominator 

range 

Number of allowed 
denominators 

1 -A 

9 

9 

10 

99 

9 x 9 = 9 2 

100 -> 

999 

9 x 9 x 9 = 9 3 

1000 

9999 

9x9x9x9 = 9 4 

10"- 1 -A 

10" - 

- 1 9" 

Table 

3.2. 

Removing the digit 1 . 

Denominator 

range 

Number of allowed 
denominators 


1 - 

* 9 

8 

10- 

» 99 

8x9 

100- 

* 999 

8x9x9 = 8x9 2 

1000- 

* 9999 

8x9x9x9 = 8x9 3 

io"- 1 - 

> 10" - 1 

8 x 9" _1 


The difference arises from the fact that 0 is now allowable but cannot be the 
first digit of any number. Now we are left with 

8 + 8x9 + 8x 9 2 + 8x 9 3 H + 8x 9" _1 


= 9" — 1 terms of the 10" — 1 possible. 

It is obvious that this last argument is valid for each of the other digits 2, ... ,9 
even though the actual sums (given they exist) will vary with the digit removed. 

Looked at in a different way, with the digit 0 the fraction of terms that we 
have removed is 

(10" - 1)- |(9" - 1) 9 9" — 1 

^ = 1 > 1-0=1 

10" - 1 8 10" - 1 > 1^00 

and with the other digits it is 

(10" - 1) - (9" - 1) 9" - 1 

— = 1 > 1 - 0 = 1 . 

10" - 1 10" - 1 n^o o 
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That is, asymptotically, we have removed ‘almost all’ terms! Put another way, 
we have the initially startling fact that almost all integers contain every possible 
digit. If we reflect on the number of digits that integers have as they get bigger, 
this is less surprising perhaps. 

So, we really have removed a great many terms from the harmonic series 
and it should be no surprise that the depleted series do in fact converge. To see 
this, again we need to take separately the cases of the removed digit being 0 
or otherwise. If we look back to Table 3.1 we have that the nine single-digit 
integers are each greater than or equal to 1 , which makes the terms with those as 
denominators each less than or equal to 1, the 9 2 double-digit integers are each 
greater than or equal to 10, which makes the terms with those as denominators 
each less than or equal to X, etc., to give an upper bound for the sum of the 
series of 


9xl + 92x Jo +93x W + 9 4 *W 


= 91 1 + 1 To 


l - 
1 10 


= 90. 




The necessary changes for the other digits brings about an upper bound of 


1 , 1 ,1 

xl + 8x9x h 8 x 9 2 x — T + 8 x 9 x — T 

10 10 2 10 3 

9 

x b 8 x I — ) + 8x( — 


10 


10 


10 


1 - 

1 10 


= 80. 


These are loose bounds but they do their job and show that the series do indeed 
converge. Of course, the slowness of the convergence hinders the computation 
of the exact sums, but R. Baillie has provided a method for summing the series 
with great accuracy and economy which resulted in Table 3.3, here given to five 
decimal places. 


3.4 Madelung’s Constants 

Finally, having omitted terms, we can take the alternative route and cancel 
them, most famously by considering the series 1 — j + ^ — g + • • • to get the 
alternating harmonic series, which sums to In 2, which is of course a special 
case of that Newton-Mercator logarithmic series. A more intriguing alternative 
is to consider a more complicated modification to get — j + t + 5 — j + g — 
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Table 3.3. The Kempner-depleted harmonic sums. 


Missing 

digit 

Sum 

0 

23.103 44 

1 

16.17696 

2 

19.257 35 

3 

20.569 87 

4 

21.327 46 

5 

21.834 60 

6 

22.205 59 

7 

22.493 47 

8 

22.726 36 

9 

22.920 67 


• • • + — • • • + — • • • , which may at first seem a touch arbitrary. The 

pattern is revealed when the series is written in sigma notation, to get 


f>iy— , 

z ' i 


(3.1) 


where ri(i) is the number of ways of representing the integer i as the sum of 
two squares (including 0 and negative integers, so 4 = 0 2 + 2 2 = 0 2 + (— 2) 2 = 
2 2 + 0 2 = (— 2) 2 + 0 2 ). The missing terms (denominators 3, 6, 7, . . . ) come 
about because not all integers can be so expressed. Whether or not a particular 
integer is capable of being expressed as the sum of two squares was originally 
established by Euler, when in 1738 he published the result that a positive integer 
can be so expressed if and only if each of its prime factors of the form 4-k + 3 
occurs as an even power. 

It is hardly obvious, but the above series does converge and the limit is known 
to be — 7r In 2. Less obvious still is the fact that the series is connected with rock 
salt. The crystallographic structure of NaCl is that of a cubic lattice and the 
electrostatic potential at the origin caused by unit charges at those lattice points 
is, by definition, 


OO 

Mi= E 

i,j,k=—o o 


(_1 y+j+k 
y/i 2 + j 2 + k 2 


where not all three variables can be simultaneously zero. The series is a very 
delicate one, as we can see by considering the subseries of it with k — 0 and 
i — j, which brings about the infinite harmonic series once again and that of 
course diverges. The erratic behaviour of the series can be seen in Figure 3.1, 
the first of many bizarre graphs that we will consider. 
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Notwithstanding this, a form of convergence can be defined for the series 
and with that definition of convergence its sum is —1.747 56459 . . . , which is 
one of the Madelung constants. An alternative formulation is 


£<-« 


tnd) 

Vi ’ 


with rj (i ) the number of ways in which the integer i can be written as the sum 
of three squares, which in Flatland reduces to 


£<-« 


, r 2 (i) 

Vi ’ 


with the cubic lattice becoming a square one and the convergence a much 
happier one, as Figure 3.2 indicates. 
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Its sum is —1.615 54 . . . and another Madelung constant (of which there is 
an infinite number as the dimension of space increases), and one which involves 
the Zeta function, which we will be meeting next. 

Our series (3.1) is derived from this by omitting Rudolff’s sign. 


36 



CHAPTER FOUR 


Zeta Functions 


We may — paraphrasing the famous sentence of George Orwell — say that ‘all 
mathematics is beautiful, yet some is more beautiful than the other’. But the 
most beautiful in all mathematics is the Zeta function. There is no doubt about 
it. 

Krzysztof Maslanka 


It is time to look at one of the ‘advanced’ functions of mathematics and one 
which lies at the core of the study of analytic number theory; a function which, 
according to M. C. Gutzwiller, ‘is probably the most challenging and mysterious 
object of modern mathematics’. We will see it here in its own right and, in 
Chapter 6, linked to a second ‘advanced’ function and again in the final chapter, 
where its deepest behaviour is the stuff of the Riemann Hypothesis. 


4.1 Where n Is a Positive Integer 
The series 

“ 1 1 i 

E^- 1 + T2+^2+’” 

r= 1 

holds a special place in mathematical lore. A simple calculation suggests that 
it converges to the number 1.644 934. . . , which is hardly illuminating, and as 
we have seen from the harmonic series, it might just be diverging very slowly. 
Actually, it does converge and is a special case of the whole family of convergent 
series defined for integers n > 1 by 

^ 1 11 

COi) ~z2y, 7- 1 + ^ + ^7 + ’"- 

r = 1 
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Bracketing the terms and comparing them with the geometric series establishes 
the convergence: 


Vi- 1 111 

/ ^ y.n ^ 2 n 3" 4" 


r= 1 



2 4 

< 1 + b — 

2 " 4 " 





1 

~r 

2" _1 



provided that 1/2" -1 < 1, that is, 2" _1 > 1, n — 1 >0 and n > 1. 

The above case (with n — 2) has a distinguished history, following its appear- 
ance in 1650, when Pietro Mengoli (1625-1686) asked for its value. John Wallis 
(1616-1703) computed it to three decimal places in 1665 but failed to recognize 
the significance of 1.645 (reasonably enough). In 1673 Oldenburg posed the 
problem to the great Gottfried von Leibniz (1646-1716) (who was defeated by 
it) and it proved too much for other impressive mathematicians too, including 
Jacob Bernoulli (1654-1705), who had included a reference to it in his 1689 
tract, published in Basel, Tractatus de seriebus infinitis with the entreaty, ‘If 
anyone finds and communicates to us that which thus far has eluded our efforts, 
great will be our gratitude’; and so the problem has become known as the 
‘Basel Problem’, ‘the scourge of analysts’, according to Montucla. The younger 
brother, Johann Bernoulli (1667-1748) (and mentor to the young Euler), tried 
and failed too and perhaps it was he who encouraged his brilliant student to 
attempt it — and having attempted it he eventually conquered it. In 1731 he 
computed the sum to six decimal places, in 1735 he sharpened his calculation 
to the number 1 .644 934 066 848 226 436 47. . . and, later in that year, with his 
star still in its early ascendancy, he wrote, ‘quite unexpectedly I have found an 
elegant formula involving the quadrature of the circle’, by which he meant jr. 
With his genius for analytic manipulation and his characteristic disregard for 
rigour he had shown that 


111 _JT 2 

I 2 + 22 + 32 + "' _ ~ 6 ' 

The curious number 1.644934. . . turns out to be gjr 2 , an astonishing result that 
did much to enhance Euler’s growing reputation. Not unreasonably, it, combined 
with the divergence of the reciprocals of the primes, led him to remark (in 1737) 
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that there are many more primes than perfect squares. It would take more than 
100 years and the controversial work of another mathematical giant (Georg 
Cantor) to give rigour to this comment — and in doing so, to show it in that 
rigorous sense false. 

Euler’s original proof is magical and demands to appear here above all others, 
including a later, more careful and completely different version provided by him 
to answer critics. It begins with the standard Taylor expansion of sin x, 


3 5 7 

X X X 

sin x = x 1 1 , 

3! 5! 7! 

which converges for all x. Euler interpreted the left-hand side as a polynomial 
of infinite degree. Since it is a polynomial it can be written as a product of 
factors and since the roots are 0, ±rr, ±27r, ±37r, . . . , the polynomial can be 
written as 

x(x 2 — 7 t 2 )(x 2 — 4i t 2 )(x 2 — 9: x 2 ) 
and this can be rewritten as 


Ax | 1 ~ 

i x- 


2 2 7T" 


1 


3 2 :r 2 


Since 


Sill A' 


as x 


it must be that A = 1 . So, 


sin x = x — — + — - H = x ( 1 , ]( 1 — 


3! 5! 7! 


7T- 


l 2 n 2 ) 


V i- 


3 2 7T 2 


This astonishing piece of ingenuity is now part of the theory of infinite products, 
and through that theory is made rigorous. Now he equated the coefficients of 
x 3 on both sides to get 

11 1 1 1 

3! 7 r 2 2 2 7r 2 3 2 7r 2 4 2 7r 2 

or 

111 _ 7 r 2 

I 2 + 22 + 32 + " ' “ ~6 

and the result has appeared as if from nowhere. 

Bearing in mind the level of resourcefulness (and genius) required to establish 
the result, we can share A. G. Howson’s amusement that ‘one of the questions 
set to candidates for the first London University Matriculation Examination (in 
1838), an examination set for students of 19 years or under who wished to enter 
the university, was: “Find the sum to infinity of the series 

1 1 1 

T 2 + 2 2 + 3 2 + "' 
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and 

1 , 1 , 1 , 

Ix2 + 2x3 + 3x4 + ”’ 

There is no indication how the examiner intended the question to be solved; 
the examination syllabus, which did not include the calculus, referred only 
to “arithmetical and geometrical progressions” and “arithmetic and algebra”.’ 
Partly through the connection with 

111 

l2 + 22 + 32 + "' ’ 

the number g?r 2 turns up surprisingly often and frequently in unexpected places, 
as we shall see. A quite astonishing appearance of it is this: if we take two 
positive integers at random, the probability of them being co-prime (that is, 
having no common factors) is none other than 1 in This is so shocking 
that we will take the considerable efforts that are needed to establish the proof, 
but before we can do so we will need more of Euler’s help and so we must 
revisit it later. 

In the final chapter we will mention three famous lists of mathematical prob- 
lems, one of the turn of the 20th century, the second near its end and the third 
at the turn of the 21st century. Euler made four such. The first was read to the 
Mathematics Department of the University of Berlin on 6 September 1742 and 
consists of seven problems, not as a challenge to the mathematical community 
(as were the others) but as a list of ideas that he considered important and on 
which he was currently working. It was the following. 

1 . Determination of the orbit of the comet which was observed in the month 
of March in the year 1742. 

2. Theorems about the reduction of integral formulas to the quadrature of 
circles. 

3. On the finding of integrals which, if the value determined is assigned 
after the integration of the variable quantity. 

4. On the sum of series of reciprocals arising from the powers of natural 
numbers. 

5. On the integration of differential equations of higher degrees. 

6. On the properties which certain conic sections have in common with 
infinitely many other curved lines. 

7. On the resolution of the equations d v + ayy dx — bxm d.r. 
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For the most part, they lack the crispness of the modern specification of a prob- 
lem. Problem 3 seems obscure, problem 7 is a form of the Riccatti differential 
equation, which we would write as 


dy 

dx 


+ ay 2 — bx m , 


and in problem 4 we see the reference to Zeta series. 

There was ample evidence that his efforts were in part rewarded in his Intro- 
ductio of 1748. In it, by equating other coefficients, he listed results for £(x) 
for x = 2, 4, 6, ... , 26. For example, 


?(4) = 


1 

I 4 


l 

¥ 


l 

¥ 


7 r 


4 


90' 


To demonstrate the difficulty of the problem, the sum for x — 26 is 


F 26 )= J26 + ^26 + ^26+-'- 

_ 2 24 x 76 977 927 x tt 26 

~~ 27! 

1315 862 _ 26 

~~ 11094481976 030 578 125^ ’ 

and all without a calculator. 

Using similar ideas he was able to prove, for example, 

111 _ it 2 

¥ + ¥ + ¥ + "'~ T’ 

111 _ t r 4 

T 4 + F + F + "' _ %’ 

ill _n 3 

F _ F + F ^ 32’ 

111 _ 5t r 5 

F _ ¥ + ¥ _ 1536' 

In a later paper, published in 1750, he recorded one of his major triumphs by 
solving the general problem for even n, showing that 


?(2 n) = J2 

r = 1 


1 

y2n 


(-D 


n — 1 


(27T) 2 " 

2(2/7)! 


Bui, 


where lh„ are the Bernoulli Numbers, which we will discuss in Chapter 10. 

Astonishingly, no general formula is known for £ (n) for n odd (and of course 
greater than 1), which makes the last two results listed above all the more 
tantalizing. 


41 



CHAPTER 4 


For interest, here are the first few sums to several decimal places: 

?(3)= ^ 3+^3 +^ + --- = 1.2020569031..., 

f(5) = p + ^ + p + --- = 1.0369277551..., 

f(7)= + ^ + ^ 7 +••• = 1-0083492773.... 

£ (3) is another of the many named mathematical constants; it is called Apery’s 
constant, honouring Roger Apery, who, in 1978, proved it to be irrational. Not 
even that is known of any one of the others, even though the sums for even n 
are obviously transcendental. Given the pattern that exists for even powers, it 
is tempting to conjecture that 


f (2 n + 1 ) = 


E 


i 

p2n-\-\ 


p 

—i r 


2n+l 


q 


for some integers p and q, which, in the case n — 2 , would amount to trying to 
prove that 


1.036 927 755 1... 


0.003 388 434... 


is rational. Umm. There is inexorable progress though. In 2000, T. Rivoal proved 
that there are infinitely many integers n such that ( (2n + 1 ) is irrational, and 
subsequently in 2001 that at least one of f (5), f (7), £(9), . . . , f (21) is irrational. 
Again in 200 1 this result has been tightened by Zudilin to replace 21 by 11. 


4.2 Where a Is a Real Number 

We have been looking at f (n) for n a positive integer. The earlier proof showed 
that 

oo i 

^ n > = E^’ n>1 ’ 

/•=i 

is meaningful and made no assumption about n being an integer. If we replace n 
by the continuous, real variable x > 1, we meet the real ‘Zeta function’, whose 
graph is shown in Figure 4. 1 . 

The vertical asymptote is at x = 1 because of the divergence of f (1) and 
the horizontal asymptote is at y — 1 since the terms of t; (x) beyond the first 
contribute vanishingly small amounts as x -> oo. 

The asymptotic behaviour can be more exactly measured. If we overestimate 
the area under y = 1 /u x for fixed x, between u = 1 and u — n + 1 by rectangles 
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Figure 4.1. The Zeta function. 


10 



of width 1, as in Figure 4.2, we have that 

1 r n+l d u 

t* ~ Ji & < l ’ 

since this quantity is just the sum of the areas of the shaded, curved triangles at 
the top of each region, which can be slid to the left to fit in the first rectangle, 
which has area 1 . This means that 

1 

(n + l)-* -1 

and so, as n — »■ oo, |f (x) — 1 /(x — 1) | ^ 1, which means that 



E- 

uX 


u = 1 



|(*-l)f(*)-l| < \x-l\. 
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Now take the limit as x —> 1 + and we have that ( x — l)£(x) —*■ 1 as x — > 1 + . 

We will use this result later and also extend the definition of the Zeta function 
once again, this time from real x to complex z, with profound implications. 


4.3 Two Results to End With 


Before we move on to see how the Zeta functions bring about existence of 
Gamma (with a little help from Euler), we will mention two miscellaneous and 
nice results related to them. 

Firstly, we know that the prime series 

1 


diverges, but it must be that the series of prime powers converges. Exactly 
to what numbers is yet another question with no answer but we can at least 
conclude that 



E 

p prime 



E 

p prime 


< 


E 


r= 2 


1 

r - 


1 < 1 for 72 > 1, 

6 


which isn’t much but it’s about as much as we can expect for so little work in 
this most difficult area of mathematics. 

The final item we will mention is a 1697 result of Johann Bernoulli, and is 
very easy on the eye. It is that 

r 1 l ill 

Jo ^ dx = F + ^ + 3^ + '"- 

The integral is improper, with 0° indeterminate, but we also have the well- 
known result that lim v ^o x x = 1 and with this in place we can indulge in a 
feast of integration by parts to prove the formula 


fl I f 1 7*1 °° 

/ J_ djc= / e - xinx dx = / y 
Jo x* Jo Jo 


( — x \nx) r 


dx 


00 1 7-1 00 7_1V 7-1 

= Eh/, <—*!»*>' 4« = £ — / 

r=0 U r= 0 U 


x r ln r x dx 



x' In' x dx. 
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Now we attack the integral using parts and the fact that In x grows much more 
slowly than any power of x to get 


L 


x r In' x dx = 


and so 




r + 1 


In' x 


Jo 


r + 


T L 


1 v r +1 


In' 1 x d.v 


r 

r + 1 Jo 


x r In' 1 x d.v 


= •••(-!) 
= (-D 


r\ 


I r 1 


(r+1)' 

r\ 


/ 


x' d.v 


(r + l) r + 1 


Jo x-' “ r! (r + 1)' +1 

OO . 

= 1 + H77X7 


. (r + iy+ l 

r = 1 

1 i i 

~~ T 1 + 2? + ¥ + " 
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CHAPTER FIVE 


Gamma’s Birthplace 


The mathematician may be compared to a designer of garments, who is utterly 
oblivious of the creatures whom his garments may fit. To be sure, his art origi- 
nated in the necessity for clothing such creatures, but this was long ago; to this 
day a shape will occasionally appear which will fit into the garment as if the 
garment had been made for it. Then there is no end of surprise and delight! 

Tobias Dantzig 


5.1 Advent 


So, the harmonic series diverges, slowly. Just how slowly can be measured 
using its interpretation as a discrete logarithm. The area f l ( 1 fix ) d.r = In n is 
bounded below by the areas of the underestimating rectangles and above by the 
areas of the overestimating rectangles, which using Figures 5.1 and 5.2 results 
in the inequality 


1 

2 


+ 


1 

3 


1 

n 



- d.v < 1 + - + 
x 2 



1 


n — 1 ’ 


i.e. 

1 

H n — 1 < In n < H n 

n 


or 


1 

In n H — < H„ < In n + 1 . 
n 

We have an estimate of //„ as In n with an error of at least 1 / n and at most 1, 
with H n confined between the curves, as shown in Figure 5.3. Put another way, 

1 

— < H„ — In n < 1 
n 

and so, if the limit exists, 0 sC lim„_ >0C .(7/ / , — Inn) 1. 


47 



CHAPTER 5 



If we overestimate using trapezia, as in Figure 5.4, we achieve a different 
insight: 



1 

— dx 
x 


— In n 



Therefore, 


which means that 


1 1 

H n ss In n + - + — , 
2 2 n 


1 1 

H n — In n ^ — | 

2 2 n 


and so we may reasonably think that 

lim (H n — Inn) ~ 0.5. 

n-+o o 
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So, it looks like the difference between the harmonic series and the natural 
logarithm might tend to a number between 0 and 1 and near 0.5. 


5.2 Birth 


We have already mentioned that in 1735 Euler established the remarkable fact 


that 


1 1 

?(2) ~~ 12 + 22 


1 

32 


7T 


2 


6 ’ 


and thereby solved the 'Basel problem’, which had been frustrating mathemati- 
cians for years. In that same year, he published the paper 'De Progressionibus 
harmonicus observationes’, which disclosed a further natural interest in Zeta 
functions and which led to Gamma coming into existence. We will look at the 
relevant part of the paper, using Euler’s own invention of although he did 
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not make use of it himself on this occasion. Using the ubiquitous result 
ln( 1 + x) = x — jx 2 + jx 3 — jx 4 + • • • , — 1 < x < 1, 

he replaced x by 1 / r to get 


, 1 \ 1 1 1 1 

ln,1 + 7j-;~2^ + 3^~4^ 


and so 


and 


1, /r+l\ 1 1 1 

- ln ' ' + 2A “ 3A + 4A 


vl-Vi Ivi-ivI If I 

r y J 7 r 2 T A— < r 3 A 2- i r 4 


r= 1 r=l 

therefore. 


2^c 2 3 ' r 3 4 ' r 4 

r=l r=l r=l 


A 1 A 1A 1 1 A 1 1 A^ 1 

E r = E (ln(r + 1) - w ) + E3-E3 + iE^- 


r=l r= 1 

and 


2^r 2 3 ' r 3 4 ^ r 4 

r=l r= 1 r= 1 


f 1 , , , 1^1 i^i i^i 

E 7 = ln(,! + 1) +3E^“3E j+Ea > 


r— 1 

which makes 


2 ^ r- 3 ' r 3 4 ' r 4 

r=l r=l r=l 


" 1 1 " 1 1 " 1 1 " 1 

E:- ln(n + 1) = 9 E^-tEa + jEa — ■ 


r= 1 


2 ^ A 3 ^ A 4 ' r 4 

r=l r=l r= 1 
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In the limit as n -> oo we have the difference between the divergent harmonic 
series and the divergent natural logarithm expressed in terms of an infinite 
number of convergent Zeta series, the sums of which would therefore be very 
nice to know. We have seen that Euler did eventually solve the general problem 
of summing the Zeta series for even powers but also that the problem with 
odd powers remains open to this day and of course he was bound to resort to 
numeric methods to approximate the sum on the right-hand side, which in the 
‘De Progressionibus’ he announced as 0.577 218. 

In fact, Euler had the logarithm on the right-hand side to give 


" I I " 1 I " 1 1 " 1 


r= 1 


2 1 ' r 

r = 1 


3 r J 

r= 1 


4 * — ' r’ 

r = 1 


In his own words from ‘De Progressionibus’: 

Quae series cum sint convergentes, si proxime summentur prodibit 


1 

1 + 2 


- = log(i + 1) + 0.577 218 


Si summa dicatur .y, foret, ut supra fecimus, 

di 

ds — 


i + 1 


ideoque s — log (i + 1) + C. Hujus igitur quantitatis constantis C 
valorem deteximus, quippe est C = 0.577 218. 

Moving from 18th-century Latin to 21st-century English: 

This series, since each term is convergent taken one after the other, 
will proceed 


1 


1 


1 

2 + ' i 

If the sum is called s it would be that 

d/ 

ds = 


7 = log(i + 1) + 0.577 218. 


z + 1 


as we have seen above, and so s — log (7 + 1) + C. Therefore, 
we have revealed the value of this constant to this accuracy to be 
C = 0.577218. 


And a birth is recorded under the name of C. Other letters have subsequently 
been used but it is y that has become permanently attached to the number 
which, as we mentioned in the Introduction, he regarded as ‘worthy of serious 
consideration’. He lavished considerable attention on it himself, partly hoping 
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to identify it in terms of some other known constant or function. In 1781 under 
the name of C (albeit with the logarithm of n rather than n + 1 ) he communicated 
the memoir ‘De Numero Memorabili in Summatione Progressionis Harmonicae 
Naturalis Occurente’ to the Petersburg Academy, which was entirely devoted to 
its study, and in which he admits that its nature still eluded him. He remarked 
that he had hoped that his C was itself the logarithm of another number of 
import but, having failed to identify any such number, continued by giving a 
whole list of series by which its approximate value might be calculated, two of 
which were 

oo i 

Et(UO-i) = i-r 

z — ' l 


and 


OO 1 

V ^(£(2 i) - 1) = 1 - Y - In f . 

^ (2i + 1)2 2 ' S ‘ 2 

He used the first (which we will prove in Chapter 12) to evaluate the constant 
to five decimal places and the second to evaluate it to 12 decimal places of 
accuracy. 

The years have passed and the number has indeed been afforded that ‘serious 
consideration’ by any number of mathematicians but has hardly cooperated, 
and even at its venerable age of 267+ it is still so deeply shrouded in mystery 
that it is not even known if it is a fraction. In fact, the great G. H. Hardy, whom 
we will soon discuss, offered to vacate his Savilian Chair at Oxford to anyone 
who could prove Gamma to be irrational ! 
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The Gamma Function 


There is no branch of mathematics, however abstract, which may not some day 
be applied to phenomena of the real world. 

Nikolai Lobatchevsky (1792-1856) 

We will now look at that second ‘advanced’ function and its link with Euler’s 
constant and with the Zeta function. 


6. 1 Exotic Definitions . . . 
The striking integral 


/ 


In I - 


x — 1 


dr 


occupied some of Euler’s many mathematical thoughts during the years 1729 
and 1730 and in a letter to Christian Goldbach (1690-1764), dated 8 January 
1730, he proposed its use in a quite startling way. It converges for x > 0 and 
can be considered as a function of x in that domain, a function whose properties 
are surprising and unexpectedly useful. In 1 809 Adrien-Marie Legendre (1752- 
1833) gave it the name Gamma and the matching symbol r and so we have 




r(x) = 

The substitution t 

Clearly, 


In I - 
t 


x-1 r l 

I dr = f 

Jo 


(— lnr)* 1 dr, x > 0. 


— In r results in the useful alternative 

"OO 


r(x) = 


= f 


t x 1 e 'dr, jc > 0. 


n d = 


rOQ 

= I e~ 1 dr 


= [-e 


— l-lOO _ I 

o — 
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Figure 6.1. The extended Gamma function. 


And also 


r (x + 1) = / t x e~' At 


r 

Jo 


L 


= [—t x e r ]n° + x / t x l e ' d t = xT{x). 


This last property is its ‘functional relationship’ , which can be used to extend 
the definition beyond x > 0 (we will meet a further critically important func- 
tional relationship in the final chapter) by rewriting the identity as 


r(x) = 


r(x + 1 ) 

X 


and so, for example, = — 2/’(j). The vertical asymptote at x = 0 

prevents the function being meaningful for the negative integers but otherwise 
the extension is to all M (and later to C, minus those integers). Its graph is shown 
in Figure 6. 1 . 

The function begins to reveal some of its subtleties when we take x = n to 
be a positive integer, since the functional relationship becomes 


r(n) = (77 - 1 )T(n - 1 ) = (77 - 1 ) (77 - 2)r(n - 2) 

= (77 - 1)(77 - 2)(« - 3)r(n - 3) = • • • = (77 - 1)! 


and so the Gamma function can be thought of as an extension of the factorial 
function, which is defined only for positive integers. If we allow the exclamation 
mark to be used in this extended sense (rather than using r) we discover the 
painful ‘factorial fact’, disbelieved by so many students, that 


0! = (1 - 1)! = F( 1) = 1. 
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If we accept a standard result that 



we can easily develop other exotic looking things such as 

(5)! = r$) = \r{\) = \ r t-^ 2 e-‘ dr = f°° e~“ 2 du = ^ 
^ Jo Jo * 

and the possibly even more striking 

(-±)! = r(±) = V^. 


Of course, infinitely many exact values of r can be generated in this way, but 
interestingly there is no known exact value for G(^) or G( or infinitely many 
other values, although many are known to be transcendental. 

In fact, on 13 October 1729, Euler had already proposed to Goldbach the 
definition 

r(x) — lim r r (x ), 

r— >00 

where 


r r (x) = 


r\r 


I r x 


x{\ + x)(2 + a) • • • (r + x) 

r X 


x\ 1 + - 1 ( 1 + - 


1 + - 
r 


and for the moment this turns out to be a more useful form than the previous 
two. 

It is hardly obvious that this is in fact the Gamma function, but we can recover 
the original definition by establishing that in the limit the functional relationship 
and boundary condition are satisfied, 


r r (x + 1) = 


*\r x + l 


(x + l)(x + 2) • • • (x + r)(x + 1 + r) 
x r r (x) 


x + r + 1 


r(x + 1) = lim / \ (x + 1) = lim 

r— >00 r^-oo x + V + 1 


xr r (x) = xT(x), 


which is the functional relationship and 


rx i) = 


r\ 

(1 + !)(! + 2)-.. (l+r) r 


r 
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120 

x! 

100 
80 ■ 

60 
40 ■ 

20 ■ 

0 2 3 4 5 

x 

Figure 6.2. The factorial function. 



r( 1) = lim 7j-( 1) = lim — — = 1 

r — >oo r — >00 r + 1 

and the boundary condition is indeed satisfied. 

6.2 ... Yet Reasonable Definitions 

This all might seem a bit contrived. Why generalize the factorial in such a 
seemingly bizarre way? After all, if we think about the problem geometrically 
we have the discrete factorial as in Figure 6.2 and what we want to do is to join 
the dots in a useful way. However we join them, we will want an explicit formula 
and if we write the extension as /(x), then certainly we want /( 1) = 1 and 
f(x + 1 ) = x f (x). Do these conditions restrict us to a single way of joining up 
those dots? The answer is ‘no’ but we need only one more reasonable condition 
to change that answer to ‘yes’ and that condition is pointed to by a significant 
result of 1922, known as the Bohr-Mollerup Theorem. If we look at the plot of 
ln(D(x)) for x > 0, as shown in Figure 6.3, we see that it is always convex. 
The Bohr-Mollerup Theorem tells us that, with the two conditions above and 
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with In (/(a)) convex, fix) must be the Gamma function — no other function 
will do ! 


6.3 Gamma Meets Gamma 


Karl Weierstrass (1815-1897) rewrote the definition and by so doing brought 
about the link between Gamma the number and Gamma the function. 


r -(x) = 



e x(ln r- 1- 1/2- 1/3 \/r) e (x+x/2+x/i-\ F x/r) 



g — *(1+1/2+ l/3+-+l/r— In r) 


and so 


with 


x 



1 


= lim 


1 


r(x) r^oc r r (x) 


n 

r=\ 


X 

1 + - 

r 


,-x/r 


lim 

r->o o 



lim (H r — In r) — y. 

r— >oo 


If we take this a bit further, we have 


CO 

— In r(x) = In x + yx + ^ 

r = l 
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Differentiating both sides with respect to x and moving the minus sign across 
gives 


nx) = _l_ | yVl 1 /r \ 

r(x) x ^ “ \r l+x/r) 


1 


x 



1 


r + x 


which defines the Digamma (or Psi) function (x) — F' (x)/ Fix). 
Now, evaluating this at x — 1 gives 


*(!) = 




= -y 


and so r'(l) = —y, and we have the geometrically appealing result that y is 
numerically the gradient of the Gamma function at the point with x -coordinate 1 . 
If we incorporate the 1 /x into the sum, we have that 


<P(.r) = 



r + x — 1 


and so 
>F(x • 




r= 1 


r = 1 


1 


r + x — 1 


1 

x 


and we have a familiar looking recurrence relation 


1 


F(x + 1) = V(x) + -, 
x 


familiar because we can recall the recurrence definition of the harmonic series 
as 

1 


H r = H r — [ + - 


r 


for r > 1, with H\ = 1 . Taking x as the non-negative integer n. using the 
condition 'P(l) = — y and chasing the recurrence relation down those integers 
results in the nice relationship (n) — —y + H n - \ . 


6.4 Complement and Beauty 

In this final section we will establish an important formula involving the Gamma 
function (again originally discovered by Euler), and a beautiful and far-reaching 
connection between it and the Zeta function. 
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Using the earlier result we can write 


1 1 

rwu 1 !) 


= -x 2 e yx e~ yx ' 


r— I 


rr(i + f 

111 r \ r I 


but r( 1 — a) — —xr(—x) and so 
1 1 


oo , 2 

X 


r(x) r(\-x) 


>-3 


r= 1 


and since we have the magical Euler formula 

sin(;rA) = nx(\ - ^Vl - ^)(l - ^ 


we have that 


or the 


1 1 


sin(7rx) 


r(x) r( 1 — x) it 

Complement Formula 
r(x)r( i - a) = - 71 


sin(7rx) 


which is valid whenever x and 1 — x are not zero or negative integers. A 
‘reflection formula’ for a function /(a) is one which relates /(a ) to f(a — x) 
for some constant a. The Complement Formula is then the reflection formula 
of the Gamma function, with a = 1 . 

Now recall that 

pOO 

r(x) = I t x ~ l e~’ At for a > 0 


and make the change of variable t — ru to get 

p OO 

r( a) = / (ru) x ~ l e~ ru r Au = 


Hence 


and 


poo poo 

— I (ru) x ~ l e~ ru r du = r x I u x ~ l 

Jo Jo 

■ x r( x) J Q 


~ l e~ ru Au. 


u x - l e~ ri, Au 


°° i 1 °° p OO 


r x r (a) ^ Jo 

r= 1 ;=1 


u x ~ l e~ ru Au 


P(x) 


poo ^ 

J ° r = 1 


e~ r “ Au , 
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having pushed the sigma through the integral, and summing the infinite geo- 
metric series results in 




which is valid for x {1, 0, —1, —2, . . . }; a relationship which we will later 
see has far-reaching consequences. 
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Euler’s Wonderful Identity 


In great mathematics there is a very high degree of unexpectedness, combined 
with inevitability and economy. 

G. H. Hardy (1877-1947) 


7. 1 The All-Important Formula . . . 


Euler wanted to establish the divergence of the reciprocals of the primes. We 
have already seen Erdos’s stylish proof of this but that will not prevent us from 
revelling in the glory of Euler’s inventiveness, particularly as it brought about 
a result which is the cornerstone of analytic number theory, and which we will 
have considerable use of later. 

The positive integers are a Unique Factorization Domain, that is, every pos- 
itive integer is uniquely expressible as a product of primes (which is why 1 is 
not considered prime), and from this innocent fact Euler extracted wonder by 
producing the equivalent of the following arguments. 

Since for any positive integer r , we can write r — 2 n 3 r 2 5 r3 ■ ■ ■ for some 
r \, r2 , rj , ... € {0, 1, 2, 3, ... } we have that 

1 1 _ 1 

r x ~ ( 2 '' l 3 r 2 5 r} • • • )' Y _ 2 xn 3 xr 2 5 xr 3 • • • 

and for a > 1 


CM = 


E 


i 


E 

r\ ,** 2 , 7 * 3, ...^0 


l 

2*n 3-* T 2 5 xr 3 . . . 
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Now each term is a geometric series summing to 


1 _ 1 
1 — 1 / p x 1 — p~ x ’ 


which means that we have 


Euler’s Formula 


?(*) = E 

r— I 


1 


n 

p prime 



X > 1. 


With this result the primes, the building blocks of the integers, are inextricably 
linked with the Zeta functions and through this link analytic number theory 
came into being. 


7.2 ... And a Hint of Its Usefulness 


We have already seen proofs of the infinity of the primes, but Euler’s result 
quickly provides two more. Taking the limit as x —> 1 results in 


E 


1 

r 


n ttV 

p prime 


with the divergence of the harmonic series forcing the product to be infinite and 
therefore so must be the number of primes. 

And, with the result for £(2), we have 


n 


2 


6 


t(2) = 


n 

p prime 



with the right-hand side rational if there were to be a finite number of primes; 
since n 2 is irrational (proved by Legendre in 1796) it must be that there are an 
infinity of primes — once again. 

Following Erdos’s proof by contradiction that 


E 

p prime 


i 

p 


is divergent, we can now taste the flavour of an Eulerian approach and also use 
it to give a useful estimate of the size of 
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To do this, take logarithms of Euler’s identity to get 


ln£(x) = E '"f 1 - i 


Now apply that most useful logarithmic series, ln(l— t) = —t— \t 2 — -jV- 
with t — 1 / p x to get 


In 1 - — = — 


1/1 1 1 


2 p 2x 3 p 2x + 4 p 4x 


ln ^ = E ln ( 1_ i 

/? prime ' ^ 

= E [^+( 2 , 

p prime 

- E V E 


p prime /? prime 


i i i 

2p2.v 3^3* 4p 4ji: 


1 1 1 

2p 2x 3 p 2x + 4 p 4x 


111 111 

2p 2x + 3 p 2x 4 p 4x 2p 2x 2p ix 2p 4 -' 

■ ( 1 + X + (1 )\(X 


2p 2x \ p x ) 

Now, playing see-saw with the inequality signs, p x > 2, so 

ii i i /IV 1 / iv 1 

— < and 1 >1 and (1 ) <(l I —2, 

p x 2 p x 2 \ p x ) \ 2/ 

which makes 

111 1 

~ t.- + „ i.- + JT7 + • • • < — ~- 


2 p 2x 3p 2x 4 p 4x 


and so 


1 1 1 

2 p 2x 3 p 2x + 4p 4 - r 


E -E 

p2x 

prime 

E VeV« 2 >- 


A — ' n 

p prime n 


63 



CHAPTER 7 


This means that 


ln£(*)= Y -7 

* J n-' 


error, 


p pnme 

where the error is less than £ (2). But 


In f (x) = ln[(x — l)£(x)] + In 


x — 1 


and so 


1 1 

ln[(x — l)£(x)] + In = > b error. 

x — 1 z — ' » A 

/? prime 

Recalling the result from pp. 43-44 we have that, for all x > 1, 


E -=ln — 

J p x x — 1 


bounded error. 


p prime 


Now let x -> 1 and we have the divergence. 

We can estimate the rate of divergence by saying that for large n, 


niA^E‘ 


- ss Inn 


p<n 


and taking logs gives 


ln(l — p Ininn, 


p<n 


which means that 


and so 


~Y (---— 

^ \ p 2 p 2 


p<n 


In In n 


, 1 

> — ss In In n . 

' p 
p<n 


The reciprocals of the primes diverge as an approximate double In. A more 
careful (and rigorous) argument shows that 

y p<n r 7 


p pnme v 

= 0.261497 212! 


with another reappearance of y and an appearance of one of the Meissel- 
Mertens constants. 

Later, we will have considerably more work for Euler’s formula to do. 
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A Promise Fulfilled 


The good Christian should beware of mathematicians, and all those who make 
empty prophesies. The danger already exists that the mathematicians have made 
a covenant with the devil to darken the spirit and to confine man in the bonds 
of Hell. 

St Augustine (354-430) 1 

Earlier we mentioned the barely credible result that the probability of two ran- 
domly chosen integers being co-prime is Egtr 2 . With Euler’s formula, com- 
bined with several other mathematical tools listed below, we are able to prove 
the fact; but first those tools. 

(1) In set theory, the symbols fl and U (respectively, the intersection and 
union of sets) are defined to be the set of all elements common to both 
and contained in either or both, respectively. These ‘binary’ operations on 
sets give rise to an algebra, known as ‘Boolean algebra’, named after the 
English mathematician George Boole (1815-1864), from which we need 
only the distributive law A fl (B U C) = ( A IT B ) U (A fl C). (Incidentally, 
the reader in search of greater challenge than this book can offer might 
wish to consult G. Spencer-Brown’s 1969 publication Laws of Form, in 
which he develops an arithmetic for Boolean algebra.) 

If n(A) is taken to mean the number of elements in the set A, we can 
easily see that n(A U B) — n(A ) + n(B) — n(A fl B ) and, using the 
distributive law, that 

n(A UfiUC) = n(A U(flU C)) 

— n(A) + n(B U C) — n(A D (B U C)) 

= n(A) + n(B) + n(C) - n(B DC)- n((A D B) U (A D C)) 
= n(A) + n(B) + n(C ) — n(B DC) - n(A (IB) — n(A fl C) 
+ n(A flBfl C). 


1 Here, 'mathematician' means ‘astrologer’. 
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Figure 8.1. The Floor function compared with x. 

Using induction or otherwise, it is easy to prove that the pattern of ‘one at 
a time minus two at a time plus three at a time minus four at a time, etc.’ con- 
tinues to any number of sets. This result is often called the inclusion-exclusion 
principle. 

(2) An equivalent result. The expression ( 1 — jc i ) ( 1 — xi ) ( 1 — ^ 3 ) ( 1 — * 4 ) ■ ■ ■ 
when expanded takes the form 

1 - (*1 + X2 + *3 + X4 H ) + 0*1*2 + *2*3 + *3*4 H ) 

— (*1*2*3 + *2*3*4 + *1*2*4 + •••) + ••• > 

where the brackets contain the sums of the * taken one at a time, two at 
a time, three at a time, etc., and the signs between them alternate. 

(3) The modern form of the Greatest Integer function, [ ■ ], are the Floor and 
Ceiling functions, succeeded in name and notation in the 1960s when 
Kenneth E. Iverson introduced them. The definitions are, respectively, 

L*J is the greatest integer ^ * and [*] is the smallest integer ^ *. 

If N and n are positive integers with n ^ N and the sequence \n,2n,3n, ... , 
xn stops where * is the biggest multiple such that xn ^ N, then * = IN/nj. 
This means that there are IN /n] numbers up to and including N that have n as 
a divisor. We will use this fact on several occasions throughout the book, and 
with Erdos’s proof on p. 29 have already done so. 

Also notice that |_*J = * — a for 0 ^ a < 1 , which means that, as * -a oo, 

IaJ . 

1 

* 

but in a rather complicated way, as we can see from its appealing graph in 
Figure 8.1. 
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Now we are ready for the proof. 

Consider the set of all primes P = {pi, pi, P3, ■ ■ ■ , Pr] less than a positive 
integer N , then there are 

' N 
Pi. 

of the N numbers that are divisible by at least one of the primes. Similarly, 
there are 



E 


N 

_PlP2_ 


of them divisible by at least two of the primes, etc. Now consider the N 2 pairs 
of integers, each of which is at most N,[N/p 1 J 2 of them share pi as a divisor, 
etc., and so 


E 


N 

Pi. 


of them share a single prime as a divisor. Similarly, 


E — 

IPIP2J 


of them share two primes as divisors, etc. The problem is that in doing this we 
have to multiply counted numbers: if a number is divisible by three primes, it 
is divisible by any two or one of them, which is where the inclusion-exclusion 
theorem comes in. Referring to the letters of its statement, if we write A for 
the set of pairs sharing a single prime factor, B for the set sharing two prime 
factors, etc., the inclusion-exclusion principle gives 




N 

PIP2. 


E 


N 


PIP2P3. 


f7,v, 


where /7y is the number of co-prime pairs. 
Put the other way around 


h n = n 2 -J2 


N 

Pi 


E 


N 

P1P2. 


E 


N 


PIP2P3. 


+ 


and so 


n_N_ 


i-E 

p 


N 

LPi 



1 

N 


N 

P1P2. 





IPIP2P3J 
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Now we are going to let N — »■ oo. Using result (3) we have that 


lim — 

N — >■ oo N 


N 
l Pi 


1 

Pi 


and so on for each term. With this and result (2), we have the probability that 
any two positive integers are co-prime is 


1 „2+J2 „2„2 J2 2 2 2 

p P 1 P 2 P 3 


P P 1 p P\P2 

v-_L_L _ v-_L_LJ_ 

1 2 ^ 2 + 2 ^ 2 2 2 ^ 2 2 2 +'" 
p Fl p Fl F2 p "l p 2 P 3 




= 1 Till o I I 1 T 


P\ 


1 - Pf 


1 - 


1 - P ~ 2 


1 

f(2)’ 


which establishes the result. 


Using the Beautiful Formula from p. 60 with i = 2we get 

fOO 

J Q dM = ^ (2)r(2) = ^ x 1 = 5^ 

so, the probability that two integers are co-prime is also 



How is this possible? You may very well ask! 

It is also true that the probability that k randomly chosen integers are co-prime 
is 1 :£(k), but that we will not prove! 
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What Is Gamma . . . Exactly? 


Constants don 7 vary — unless they ’re parameters. 


Anon. 


9.1 Gamma Exists 

We have pretty convincing evidence that the constant y exists, but no precise 
proof. Euler did not live in an age of great mathematical rigour and he assuredly 
was not given to spending his days trying to prove what seemed to him to be 
intuitively obvious: such thoroughness was to be the stuff of the 19 th century. 
In the 21 st, we would be uncomfortable without the security of knowledge that 
y really does exist and so we will deal with that matter now. 

Given that y does exist, perhaps the first thing to notice is that we seem to 
have two definitions of it, one featuring In n and the other ln(n + 1 ). In fact, 
they are equivalent and it is more generally the case that 

111 1 

7 + 7 + 7 H 1 ln (« + «) 

12 3 n 

is independent of a > —n, which is easy to see: 

/111 1 

llm 7 + x + x H 1 ln (« + «) 

n^oo \1 2 3 n 

/111 1 

= lim ( — | 1 (- • • • H Inn — ln(n + a) + In n 

n— *oo \ 1 2 3 n 

= lim ("j + ]- + ^ H f - — Inn — In (\ + -X\ 

n *oo y 1 2 3 n \ n ) ) 

/111 1 \ 

= lim ( 7 + - + - H 1 In n ) . 

ii *oo \ 1 2 3 n ) 

Unsurprisingly, establishing the existence of y has attracted many proofs and 
we have chosen one that follows C. W. Barnes of the University of Mississippi. 
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It is not the shortest but it is elegant, gives an equality for the Euler definition 
of e, and again makes use of the value of £(2). 

We need two very reasonable principles. 


(i) For a continuous function f(x), 

r*b 

f(x)dx = (b-a)fQ) 


f 

J a 


for some £ e [«, £>]. 


(ii) An increasing sequence of real numbers that is bounded above must 
approach a limit. 

The first of these principles (often known as the first mean value theorem for 
integration) simply says that the area under the continuous curve f(x) over the 
interval [a, b] is equal to the area of the rectangle based over the same interval 
with height determined by some value within that interval, as suggested in 
Figure 9.1. 

The second is a standard (and again reasonable) result of real analysis, relying 
on the completeness of K. 

So, we can start. Using the mean value theorem, we have on the one hand. 


/ 


l/n 


\nx dx = I - 


l/n+l 


1 


n n + 1 


1 1 1 

lnc„ = — — — — lnc„, — — <c„<-. 

n (n + 1) n + 1 n 


On the other hand, if we use the world’s most devious integration trick and 
integrate In x by parts we have 


/•l/n 

/•l/n 

/ In v d.\ = 

/ 1 X 

Jl/n+i 

fl/n + 1 



/ 1 1 
( - In 


\ 77 72 

— 

/ 1 

1 In 77 - 


) ( 1 

1 

In 

1 ^ 

/ \n + 1 

72+1 

72 + 1 / 


n + 1 


In (n +1) 


n + 1 


1 111 

— — In (n +1) In n H — 

n + 1 n n + 1 n 

1 1 

— — — (/j In (72 + 1) - (n + 1) In n) — — 

77 (72 + 1 ) 72 ( 77 + 1 ) 


1 


In 


(77 + 1)" 


1 


1 


77 (72 + 1 ) 

Equating the two forms we get 


In 


,H+ 1 

(77 + 1)" 


.,« + ! 


1 • 


-J— lnc„ = ^— fln^-lY 

22(77 + 1) 72 ( 72 + 1 ) \ ?2" + 1 J 
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' 

a 

b 


Figure 9.1. The first mean value theorem for integration, 
which means that 


(n + 1)" (i n + 1)" 

In c„ = In — 1 or In — In c n = 1 




and 


, in + 1 )"/n"+ l 
In = 1 


and this means that 


So, 


in + l) ,l /n 


n /„n + 1 


in + 1)" 1 


n " nc„ 


, iV i 

and e = I 1 H — ) 


n J nc n 


for any positive integer n. 

Since 

1 1 1 

< c n < — , n < — < n + 1 

n + 1 n c n 


and so 


and if we write 


1 1 

1 < < 1 + - 

nc„ n 


1 \" 1 

e = a n I 1 H — ) , 1 < a„ < 1 H — for n e N. 


we have that 


This is the equality for e that we mentioned earlier. 
If we take the limit, 


lim 1 ^ lim a n ^ lim ( 1 H — ) , 

n—^oo n — > oo n—>o o 
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which makes lim^oo a n = 1, we recover the Euler definition 


1 


e = lim a n I 1 H — ) = lim ( 1 H — ) . 


iv 


Now change n to r and take logarithms of both sides of 


to get 


and 


, 1 

e — a r I 1 H — 
r 


1 = In a r + r In ( 1 H — ) = In a r + r In 
r 


r + 1 


1 1 

- = - In a r + In 
r r 


r+ 1 


Summing gives 


e^eS+emT 

r= 1 r = 1 r= 1 ' 

n ^ n 

= 22 - l n a r + 22 + 1) — l n r ) 

r = 1 ^ r= 1 

A i 

= > - In a,- + ln(n + 1) 

• J r 


r=l 


and so 


H | tt i 

Y, ln(n + 1) = - In a r . 

r=l r r—\ r 

Since each of the a r > 1 , the above is an increasing sequence as a function 
of n\ we now show that it is bounded above. 

Since 1 < a n < 1 + 1/n, 

" 1 " 1 " 1 / l 

22 ln(n + 1) = 22 - l* 1 Ur < 2h — In ( 1 H — 

r= 1 r r= 1 ' r= 1 r \ r 

It is geometrically clear that ln( 1 + x) < x for x > 0 and so 

, 1\ 1 

ln | 1 + - < - 

r r 


and 


A i A i 

E ; - [n(n + l)< 22^ < 

r= 1 r= 1 

and we have the promised re-emergence of £(2). 


1 n 2 
r- 6 
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So, the left-hand side is bounded above as well as increasing and it therefore 
tends to a limit: recalling the earlier observation that 

/111 1 \ 

Y = lim - + - + H 1 ln(« + a) , 

n^oo \ 1 2 3 n ) 

put a — 1 and we are finished. 


9.2 Gamma Is . . . What Number? 


Now we know that y exists, it is not unreasonable to ask for its value and since 
its exact nature remains one of its mysteries, we are bound to concentrate on 
approximations. We have already seen on pp. 47 and 48 that it lies between 0 
and 1 and looks likely to be around 0.5. 

To find a decimal expression for y we could simply evaluate y n — H n — In n 
for increasing values of n, but the convergence is extremely slow; for example, 
yioo = 0.582 207 33 1 65 1 53 ... , which is accurate only to one decimal place, 
and yi oooooo = 0.577 216164 901481... is accurate only to five decimal 
places. With each component equally reluctantly diverging to infinity, it seems 
a shame that they combine to an equally reluctant convergence. The reason is 
exposed by the inequality 


1 

2 (n + 1) 


< Yn — Y < 


1 

2 n ' 


n e N. 


Assuming that this is true, if we want an accuracy of m decimal places, we 
require y„ — y < 5 x I0~ m ~ l 2 and so 


1 

— < 5 x 10 

2 n 


— m — 1 


which means that n > 10"', and the strict inequality is needed, since 

, -l 


1 


1 


Yn — Y > 


2 (n + 1) 2 n 


1 


1 


= ^ 1 + - 1-- 


2 n 


1 


and if n — 10" 


1 


Yn - Y > 


2 x 10' 
5 


1 - 


1 


10 m+1 V 10 


10 ' 
10"' - 1 


IQ/n + l 


1 - 


10 " 


= 4. 999 999 999 99 5 x 10 

^ V ^ 

(m— 1) times 


— (m+1) 


which guarantees that the approximation is incorrect in the mth decimal place. 
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Having used the inequality, we give R. M. Young’s proof of it, which uses the 
technique we adopted on p. 43 to describe the behaviour of the Zeta function 
as x —*■ 1 + . Referring to Figure 9.2, 


N 


Y, shaded areas touching the curve 

7" +, i d ,- + (7 

V Jn x n+ 1 / \J„ 

r r ii.-i'i 

\Jn-i x N J 


"+ 2 1 1 

— Ax 

n + 1 X n + 2 


pN | N ~ n i pN i / ^ i « 

= f -d x -T — = 

Jn x lY n + '' J« X r “ 


/ ^ I 

= f^-E 7 




In n 

r= 1 7 x r=l 

Now let iV — >■ oo and we have, by definition, 

OO 

shaded areas = — y + y n — y n — y. 
n 

If we now horizontally translate the shaded regions so that they all lie in the 
first rectangle between n and n + 1, we see that each region has an area less 
than one-half of the rectangle enclosing it (owing to the concavity of 1 / x ) and 
so the total area of all of the regions is less than one-half of the area of the first 
rectangle, which is clearly 1 /«, which means that 


Yn ~ Y < 


1 

In 
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Figure 9.3. The lower bound. 


To achieve the lower bound, embed a right-angled triangle in each region as 
shown in Figure 9.3, where the hypotenuse is the continuation of the hypotenuse 
of the circumscribed triangle to the right. The shaded triangle and the circum- 
scribed one used to define it are clearly congruent and since the area of the latter 
is 


H- 

2 \ m + 1 


summing these gives 


oo 

y n — y = shaded areas > 

n 


1 

2 



1 

m + 1 


1 

2(77 + 1)' 


And so we have 


as required. 


< Yn — Y < — , 

2(77 + 1) 2 n 


9.3 A Surprisingly Good Improvement 


The above bound relates to the In n form of the definition of y and even though 
in the limit we have seen that 

/111 1 \ 

y = lim ( — | 1 b • • • H ln(« + a) I for any a > —n, 

n->oo V 1 2 3 77 J 


we might expect the choice of a to influence the approximations for finite values 
of n, and so it does, as we can see if we construct an error function e„ (a), defined 
by 

111 1 

£»(«) = 7 + - + -H 1 ln(77 + a)-y, 77^1, a > -n, 

1 2 3 77 

where y can be represented in decimal form to any degree of accuracy using 
its original definition (given we have the patience and calculating accuracy 
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(c) 



Figure 9.4. The error function near its zero, (a) n = 10, zero is 0.503 962 732 569 747 ; 
(b )n = 100, zero is 0.500 414 587 370 329; (c)n = 10000,zerois0.500004 166069 63; 
(d) n = 100000, zero is 0.500000401 909 347. 


needed). Of course, for all a, s n (a) —> 0 as n — > oo, but it is interesting (and 
surprising) to look at the function for fixed n as a varies. 

If we differentiate with respect to a, we get 

de„(a) 1 

da n + a 

and so the function will forever (but diminishingly) decrease from +oo at its 
vertical asymptote at a = —n, to — oo as a increases, making its zero unique. 

Figure 9.4 concentrates on the interval 0 ^ a ^ 1 and over this small interval 
inevitably give a false impression of linearity but the eye is drawn to the zero 
at a value of a ever closer to 0.5. 

If we take the strong hint provided by these plots, we would reasonably take 
a = 4 if we wish to minimize the error for any n and so consider the form of 
the definition as 

Y = lim ( 7 + ^ + \ H b - - In ( n + ^ ) ) = lim p n . 

n->oo \1 2 3 n \ 2 // n^oo 

Recall that, with a — 0, yioo is accurate only to one decimal place and y\ oooooo 
only to five decimal places; now, with a = j, pioo = 0.577 219 790 140 49 and 
Pi oooooo = 0.577215 664 900631, and these are accurate to five and eleven 
decimal places, respectively. 

The explanation for this huge improvement is that 

1 1 

24 (n + l ) 2 < Pn ~ Y < TAn 1 


76 


WHAT IS GAMMA . . . EXACTLY? 


and we give Duane W. DeTemple’s proof of the result: 

Pn - Pn+ 1 = H n - In (n + |) - i /,,+1 + In (n + |) 

1 i , 

= 7 - ln(n + A) + In (n + A). 

n + 1 


Define the function 

/(x) = ln(x + ^) + ln(x +|), x > 0, 

x + 1 

^ _ 1 1 1 

f(x) ~ oT^“^Tl) + aT|) 

1 1 x 2 + 2x + | - (1 + 2x + x 2 ) 

(1 + x)~ (x + ^)(x +|) (1 + x) 2 (x + j)(x + j) 

= — \(. x + 1 ) 2 ( x + 5 ) '(* + §) 1 

and since (x + |) _1 < (x + 1) _1 < (x + j) -1 , —f'(x) < ^(x + j) -4 . 

As /( 00 ) = 6, 

/(£) = - J fix) dx < - J (x + j) -4 dx 
= — + 2^ = 12$ + 2 ) 

Since (£+^) 2 > £(£+1), (ArT^) 4 > £ 2 (A:+1) 2 and (fc+ j) -4 < 1/(A: 2 (A:+1) 2 ) 
and so 


$ +5) 3 < 


1 j 

* 2 (it+l) 2( * + 2 ) 


1 / 1 


1 


1 2k + 1 

2 k 2 (k+ l) 2 ~ 2 \k 2 (k + l) 2 

r-jfc+1 

x -3 dx. 


=/ 

OO 

Pn-Y = ^2, $ k ~ Pk+]) 

k=n 

OO OO 0 

= E/«)<1 - 2 Y.«+\r l <T 2 j 


x 3 dx 


k=n 

1 


24n 2 


And we have the inequality one way around. 
The other half is found in the following way. 
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Since (x + j)(x + f) = x 2 + 2x + | < (x + l) 2 , (x + 5) ! (x + f) 1 > 
(x + l) -2 and so —f'(x) > \(x + l) -4 . 

As before, 


/(*) = 


-J fix) dx>- 
-^[(xT 1)“ 3 ]£° = 


l 

X 

12 


(x + 1)“ 4 

(* + ir 3 . 


dx 


So, 


Pn Y — ^ ' (Pk Pk+ 1 ) 

k=n 

oo oo r 

= £/«> A£<*+» _ 3 >u/ 

fc=n 


x 3 dx 


k=n 


1 


24 (n + l) 2 ' 


And we are done. 

Again, if we wanted an accuracy of m decimal places, we require p n — y < 
5 x 10 _m_1 , and so 


1 


24/? 2 


< 5 x 10 m 1 and n > 


1 10 m+1 
5 x 24 


0.288 675 x 10" !/2 . 


Again, the strict inequality is needed, since 


1 


1 


Pn ~ Y > 


24 (/? + l) 2 24/? 2 

and so, if n — y / 10 m+1 /(5 x 24), 


1 


-2 


1 + - > 


24/? 2 


2 

1 - - 
/? 


5 x 24 


Pn~Y > 


1- 


24 x 10 m+I \ \Q(m+V>/2 


2^/T20 \ 


= 4. 999999999 45 ... x 10“ (m+1) , 

(m— 1) times 


which again guarantees that the approximation is incorrect in the mth decimal 
place. 


9.4 The Germ of a Great Idea 

Stretching the properties of ss perhaps a little too much, we can rewrite the 
statement 

1 1 

Yn — H n - In/? « - + — =>■ y & 0.5 

2 2 /? 
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Yn ~ Y 


1 

2 n 


or y 



which for n — 1000 gives y 0.577 215 581 568 204 . . . , which is accurate to 
six decimal places, and for n = 1 000 000 gives y ss 0.577 215 664 901 481 ... , 
which is accurate to twelve decimal places; it may not be rigorous but we are 
on the right track in approximating y ! Actually, this is the first of a series of 
approximations, which continue mysteriously as 


11 11 1 1 

2 n + Yin 1 ~ 120 h 4 + 252n 6 ~~ 240n 8 + + 12u 14 


the mystery deepening with the knowledge that the term involving n 12 has —69 1 
on the top and 32 760 on the bottom. 

In fact, the approximation may be written more fully as 


Y 



+ E 


Bl r J_ 

2 r n 2r ’ 


which is a special case of the Euler-Maclaurin summation formula, where lh r 
are known as the Bernoulli Numbers — both of which we will look at next. 
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CHAPTER TEN 


Gamma as a Decimal 


A mathematician is a blind man in a dark mom looking for a black cat which 
isn ’t there. 

Charles Darwin (1809-1882) 


10.1 Bernoulli Numbers 


Our earlier focus on the Zeta series has meant that, in terms of the summation 
of series, we have in a way started on the second rung of the ladder, with the 
first occupied by the family 1* + 2 k + 3* + • • • + n k for k e N. In 1784, 
at the age of seven, Gauss had famously summed the integers from 1 to 100 
in seconds (to the amazement of his teacher) when he noticed that the series 
could be thought of as 50 pairs of numbers each summing to 101; of course, 
the young genius could not have known that the ancient Greeks, Hindus and 
Arabs each had rules which amounted to the sum for k ^ 4, nor would he 
have been aware of the work of Johann Faulhaber (1580-1635). Known in his 
time as ‘The Great Arithmetician (or weaver) of Ulm’, Faulhaber was indeed 
trained as a weaver but his mathematical prowess brought his appointment as 
the city’s mathematician and surveyor, who designed waterwheels, fortifications 
and surveying instruments and who associated and collaborated with the likes 
of Kepler, Descartes and Napier; he also prepared the first German publication 
of Briggs’s logarithms. In fact, he was a ‘Cossist’ more than an ‘Arithmetician’, 
whose 1631 publication Academiae Algebrae contained not only the sums up 
to k — 17 but also the important observation that 


l k + 2 k + 3 k + ■■■ + n k 


I a polynomial in n(n + 1) k odd, 

(2 n + 1) x a polynomial in n(n + 1) k even. 


(The term Cossist derives from the Italian word ‘cosa’, meaning ‘thing’; the 
mathematicians of the time used the word to represent an unknown quantity, we 
would use the word ‘algebraist’.) In 1636 Fermat had need of an answer as he 
calculated such sums in his development of the quadrature of f(x ) = x k , prior 
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to Newton’s calculus. He found a recurrence relation relating the sum for k with 
the sums for k — 1, k — 2, ... , which was ingenious but soon intractable and, 
although improved on in 1654 by Blaise Pascal (1623-1662), the problem had 
to wait until the next century to be solved by one of its greatest mathematical 
names. 

The first few expressions can be written: 

1 + 2 + 3-I +n — ^n(n T 1), 

1 I 2 + 2 2 + 3 2 H + n 2 — (2 n + 1 )\n{n + 1), 

1 3 + 2 3 + 3 3 + • • • + n 3 = \[n{n + l)] 2 , 

1 4 + 2 4 + 3 4 + • • • + n 4 = (2 n + 1 )±n(n + 1)[3 n(n + 1) - 1], 

which reveals nothing more than Faulhaber’s observation and the very pretty 

relationship 

l 3 + 2 3 + 3 3 + • • • + n 3 = (1 + 2 + 3 + • • • + n) 2 . 

It was Jacob Bernoulli who solved the problem and the solution was announced 
to the world in his famous treatise Ars Conjectcindi, posthumously published 
in 1713. In listing the results to k = 10, Bernoulli described the pattern that 
mattered; somewhat generously inferring that others might also have the same 
powers of insight, he wrote (without proof): 

Whoever will examine the series as to their regularity may be able 
to continue the table. Taking c to be the power of any exponent or 


/ 


n c x> n c+l + \n c + icAn c 1 

c+ 1 2 2 


+ 


c.c — l.c — 2 


Bn 


c— 3 


+ 


c.c — l.c — 2.c — 3.c — 4 


2.3.4 2. 3.4. 5. 6 

c.c — l.c — 2.c — 3.c — 4.c — 5.c 


Cri 


c—5 


+ 


2 . 3 . 4 . 5 . 6 . 7 . £ 


- On 


is - 7 


and so on, the exponents of continually decreasing by 2 until n 
or nn is reached. The capital letters A, B, C, D denote in order 
the coefficients of the last terms in the expressions for f nn, f n 4 , 
f n 6 ,f n 8 , etc., namely, A is equal to 1/6, Bis equal to— 1/30, C 
is equal to 1 /42, D is equal to — 1 /30. 

These coefficients are such that each one completes the others in 
the same expression to unity. Thus D must have the value — 1 /30 
because 


l 

9 


I + 2 _ 

2 t 3 


T5 + 9 + (+- 0 ) ~ 4t - !• 


30 
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With the help of this table it took me less than half of a quarter 
of an hour to find that the tenth powers of the first 1000 numbers 
being added together will yield the sum 

9 1 409 924 241 424 243 424 241 924 242 500. 


From this it will become clear how useless was the work of Ismael 
Bullialdus spent on the compilation of his voluminous Arithmetica 
Infinitorum in which he did nothing more than compute with im- 
mense labour the sums of the first six powers, which is only a part 
of what we have accomplished in the space of a single page. 


The withering comment regarded the prodigious efforts of Ismael Bullialdus 
(1605-1694), who needed a six-volume opus to achieve the result for the first 
six powers. Notice the use of the ‘backwards proportional sign’ for “=’, of nn 
for nr, of what is now the integral sign for summation (Euler’s influence had 
yet to take affect), of a dot for multiplication and the implied brackets in the 
expressions involving c. Incidentally, he erroneously gave the coefficient of n 2 
for k — 9 as — ^ rather than its correct value of — 4, . In identifying what he 
called A, B, C, D , . . . Bernoulli had isolated the numbers in the expansion 
which are independent of the power and if we begin to list them all, including 
those which are zero, we have the appropriately named (by Euler) Bernoulli 
Numbers Bq, B \ , ZN, • • • 


1 , 


1 

2 ’ 



1 

30’ 


0 , 


1 

42’ 


0, 


1 

30’ 


0, 


5 

66 ’ 


0, 


with a pattern anything but transparent. The next term is 691/2730 and, to 
emphasize the point, the sequence continues 


7 3617 43 867 

6’ _ ^To~’ 798 ’ 

In more modern guise, and using the standard notation 



n\ 

r\(n — r)\ 


for the binomial coefficients, Bernoulli was really saying that 


l 2 + 2 2 + 3 2 + • • • + n 2 


191 1 

2 n + 2 n = 2 

5« 3 + \n 2 + g 


Bon 2 


B Q ir 


B\n 2 
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l 3 + 2 3 + 3 3 ■ 


l 4 + 2 4 + 3 4 ■ 


+ n 3 = \n A + \n 3 + \n 2 


1 / /4 


4VV0 


B 0 n 4 


Bin 3 




+ » 4 = 5« 5 + ^H 4 + 5?i 3 — Jgfl 


1((5 


5 WO 


Bo" 


Bin 4 


B 3 n- 


B 3 n 


Bin 3 


B$n 


Although the Bernoulli Numbers lack an obvious pattern, they do possess a 
recursive definition, which Bernoulli announced through his computation of 
D. His explanation related to the expansion for k = 8, which he gave as 

l 8 + 2 8 + 3 8 + • • • + n 8 = - n 9 + - n 8 + - n 1 - -n 5 + -n 3 - -n. 

9 2 3 15 9 30 

Noting that for n = 1 both sides must be 1, it is possible to solve for any 
one of the numbers in terms of the others and this he did for his D, using for 
us a slightly strange algebraic form. Every odd-numbered Bernoulli Number 
(other than the first) is 0 and of course every even one can be found from the 
recurrence relation, albeit tediously. There are plenty of alternative ways of 
generating them and they appear as part of the coefficients of any number of 
expansions, for example 

x 

e x - 1 

(given by Euler), and they can be efficiently generated in terms of what are 
known as ‘tangent numbers’ but no one would describe them as cooperative. 
Euler computed them up to B 3 o, in 1840 Ohm extended this to B(,i and the fol- 
lowing year Adams computed them to B\ 24 — the numerator of which has 110 
digits (contrasting with the denominator, which is simply the number 30). The 
calculations cry out for the computational aids that we now take for granted, 
an application of computers that was presaged in 1 843 by Augusta Ada King, 
Countess Lovelace (and daughter of Lord Byron), who suggested to Charles 
Babbage that he produce a ‘plan’ for their calculation, using his Analytical 
Engine. Later, in her annotated translation of a publication of one Luigi Fed- 
erico Menabrea (one time Professor of Mechanics at Turin and later the Italian 
premier) dealing with ideas relating to the Analytical Engine, she described 
several such ‘plans’, which might be considered to be the earliest recorded 
computer programs for a device which she romantically posited ‘weaves alge- 
braic patterns, just as the Jacquard-loom weaves flowers and leaves’. 
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With Fermat’s Last Theorem finally laid to rest, it is of no more (and no less) 
than historic interest that the Bernoulli Numbers have played their role in its 
attempted resolution. In 1850 Ernst Kummer (1810-1893) proved the theorem 
for all powers which were ‘regular’ primes, with the definition of ‘regular’ the 
elegant ‘a prime p is regular if and only if it does not divide the numerator 
of Bi. B 4 , B(, , . . . , Bp-?,’ . It is known that the number of irregular primes is 
infinite but unfortunately whether the same is true for regular primes is unknown 
(the first irregular prime is 37 since B 32 = —208 360 028 141 x 37/510). 

10.2 Euler-Maclaurin Summation 

We have noted that 

/111 1 \ 

Y = lim t + - + H 1 Inn) 

n -* o o \ 1 2 3 n ) 

can be thought of as the difference between the sum and the integral of the 
function fix) — 1/a, in that 

/111 1 \ 

Y = lim T + - + - + •••+-- Inn 

«->oo y 1 2 3 n ) 

= lim (jr 1 - f n -dx) 

n ^°° V/Ti k 7i -V ) 

= „!™o ~ d - T ) 

and if we relegate y to secondary importance we could write 



With this emphasis we are approximating a sum by an integral and even though 
integration can be tough it can also be significantly easier than summation: we 
may be on to a good idea here. We are, but in developing the initiative Euler 
and Colin Maclaurin (1698-1746) have beaten us by the best part of 300 years, 
producing what has become known as the Euler-Maclaurin summation formula. 
We will not prove it but we will use it for our purposes, and it has wide application 
in many areas of mathematics, perhaps most of all in numerical analysis, analytic 
number theory and the general theory of asymptotic expansions. In 1736 Euler 
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had developed both the simplest form of the formula and, later in the year, the 
general form, quite independently of Maclaurin, who had published it in his 
Treatise of Fluxions of 1742. In one of its general forms it states 

J 2 f(k ) = fix) dx + i(/(l) + f{n)) 
k= 1 

+ E ~ f 2k ~ 1 ^ + R n(f , m), 

ti (2k) - 

Rn(f,m)^^ i J"\f 2 "'+\x)\dx, 

where the 82 k are, of course, the Bernoulli Numbers and the (2k — l)th 'powers’ 
of the function are in fact the (2k — l)th derivatives of it. Use of the expansion 
can be subtle and here is a case when neglect of the remainder term (R n (/, m ) ) 
can be perilous since for most functions that appear in applications the series 
diverges; fortunately, it is usual that not many terms are needed to achieve 
good accuracy and so the approximations provided by the series are generally 
excellent. This fact troubled Euler and it was left to Simeon Poisson (1781— 
1840) in 1823 to pay serious attention to the remainder term. 

10.3 Two Examples 

1. As a first move, we can gain some confidence by showing that the Euler- 
Maclaurin formula gives the result we would expect for f(x) — x 3 . The deriva- 
tives are, of course, f(x) — 3x 2 , f"(x) — 6x and f"'(x) — 6; the remaining 
derivatives are zero and so the error term is zero too: 

V'fc 3 = /" x 3 dx + j(l 3 + n 3 ) + 7^7 (3n 2 - 3xl 2 ) + ^(6-6) 

z — ' J 1 2 4 

k= 1 1 

= 3« 4 - \ + 3 + W + 2 x i(3 /? 2 - 3) 

= 3« 4 + 2” 3 + 5« 2 = {\n(n + l)) 2 . 

2. As a second application of the formula, we will look at a justly famous (if 
misnamed) result for approximating n! for large n. This time take /(x) = lnx 
to get 

, 1 „ 1 

f\x)=~, f"(x) = j- 

X 

f"\x) = 4 , ■■■, f in \x) = (- 

X D X 
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This time we will suppress the error term to get 


n pn 

Y] In k = I In x dx + \ (In 1 + In n) 
k= l 


Bi( l _ 
2! In 


Y 
4! V n 3 



5e/24_24\ 
6! \n 5 l 5 ) 


Using standard properties of logarithms on the left-hand side and that meanest 
integration-by-parts trick, on the right-hand side (writing In x = lx In x) we 
get 


In n ! = n In n — n -| — In n H 

2 12n 


1 

360 n 3 


1 

f 4“ Ci 

1260 n 5 


where C„ is the constant to this number of terms. Now exponentiate both sides 
to get 


n! = n n e "V «e c " exp 



1 1 
360n 3 + 1260 u 5 


Using the Taylor expansion of e x then gives (which the reader can check!) 


»Cn 


1+ l2^ 


139 


288n 2 


51840n 3 

571 


163 879 


2488 320n 4 209 018 880n 5 


which could be an excellent approximation ton!, if only we knew the asymptotic 
value of e Cn , given that the limit exists. The series is the well-known ‘Stirling 
approximation’, which James Stirling (1692-1770) published to the first eight 
terms in his most important work Methodus Differentialis of 1730. In fact, his 
interest was in the logarithms of factorials and he left the series in its logarithmic 
form, computing log 10 1000! to 10 decimal places, using an approximation for 
the constant. In the same year Abraham de Moivre (1667-1754) published 
Miscellanea Analytica, which, apart from anything else, contained his own 
(later to be corrected) table of logarithms, his own form of the approximation 
and a proof of the constant’s existence. It would be some years before Stirling 
would be able to find the constant in exact form and in doing so found it to be 
e c " > \/2n and the series is then 

72— MX) 


n\ — n n e n \[7jtn 


1 + 


1 

nii 


1 139 

288« 2 ~ 5 1840n 3 
571 

~ 2488 320n 4 


163 879 


209 018 880/? 5 
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Here is a case when the error term does misbehave, since for any fixed n it 
decreases as we take more terms to a point when it starts to increase; fortunately, 
with m fixed, as n increases the error term does tend to zero and we obtain ever 
better approximations to n ! . 

We will need Stirling’s approximation several times and while it is handy, we 
can sensibly mention another constant that in a way arises from it and which 
we will also mention again later. We can rewrite the first-order approximation 
as 


n\ 

«»+!/ 2 e~ n 


fin , 


meaning that 


lim 


= fht . 


>oo n 

Replacing n ! by some other asymptotically large quantity and dividing by an 
appropriate expression can lead to a constant other than fin. In particular, the 
nice 0°1 1 2 2 3 3 • • • n" and f(n) = n " 2 / 2 +«/ 2 + 1 / 12 e -« 2 / 4 combine so that 


0°1 1 2 2 3 3 • • • n n 
lim = A, 

n ^oo f(n ) 


the Glaisher-Kinkelin constant, which is about 1.282427 13 ... . 
Exotic it may be, but useful it is too — as we will see! 


10.4 The Implications for Gamma 

If we apply the Euler-Maclaurin formula to f(x) = \/x, we get 




fix) = 
This means that 

Ai = i/i i 

k l + ; 


3x2 


f\x) = i-\y 


,-r+l ' 


■E 


Jhk_ 

(2 k)\ 


(- 1)^-1 (2k 1)1 ~ 1}! 


k= 1 

+ R n (f, m) 

l/i l \ l , 

= lnn + T(7 + r)+E^( 1 -^l + R ”(f> m) ’ 


2 V 1 n 


k= 1 


with the factorial cancelling and the odd power of — 1 replaced by — 1 itself. 
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But 

/A 1 \ 

y = lim I > In n 1 

n ^°° k J 

1 A Bik 

= 2 + E^ + /? oo(/,m ) 
fc=i 

and so 

1 J ^51 

-=lnn + y + - X! + w) “ ^oo(/, '«)) 

>t=l K 11 k= 1 7 ” 

and looking at the first few terms (and ignoring the error term) we have 

1 1 , 1 1 

Y + 2 n ~ ~L2 n 2 + 120« 4 ~~ 252 n 6 + 

and we have 

V 1 1 1 1 1 1 

Y ~^k n ” 2n + \2n 2 120n 4 + 252n 6 + ” ' ' 

k=\ 

And here is the generalization of the series for y that has been suggested on 
p. 79. Euler used the series up to the term 1 / 1 2« 1 4 and with n — 10, //|o = 
2.928 968 253 968 253 9 and In 10 = 2.302 585 092 994 045 684 to compute y 
to those 16 decimal places 0.577 215 664 901 532 5 . . . . 

Of course, the desire to extend the accuracy of the estimate was great and, 
in 1790, the Italian geometer Lorenzo Mascheroni (1750-1800) published in 
Adnotationes ad calculum integrate Euleri an approximation of y to 32 decimal 
places, which he had calculated in a similar way; the estimate then became 
0.577 215 664901 532 860618 1 . . . . This was all well and good until 1809, 
when Johann von Soldner (1766-1833) used his 

f x 1 

Li (a) = / dx 

J 2 In x 

function (which will engage our attention later) to give the value 
0.577 215 664 901 532 860 606 5 ... , 

which differs in that underlined 20th decimal place (and after). The matter 
was resolved (but the confusion not removed) when, in 1812, the inimitable 
Gauss prevailed on the 19-year-old calculating prodigy F. G. B. Nicolai (1793— 
1846) to check the results. This he did, using the Euler-Maclaurin summation 
formula with n = 50 and recalculating with n = 100 to evaluate y to 40 
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decimal places — and finding agreement with Soldner. In spite of this, both val- 
ues were in circulation (and even appeared together in one publication), which 
led subsequent indefatigable calculators (again using Euler-Maclaurin sum- 
mation) independently to provide their own confirmation of Soldner’s estimate. 
Mascheroni’s permanent contribution to y’s story (apart from making a mistake 
that led to at least eight subsequent recalculations of the number) was to name it 
y (we have seen that Euler originally used C, and O and A have also been used). 
By such serendipity, its full accepted name is the Euler-Mascheroni constant. 
(A more distinguished legacy of Mascheroni is his result that any geometric 
construction that is possible with straight edge and compass can be achieved 
with a compass alone.) 

Inevitably, things have moved on since then: in 1962 Donald Knuth took 250 
terms of the Euler-Maclaurin series, with n = 10 000 to compute y to 1271 
decimal places and in 1997 Thomas Papanikolaou computed it to 1 000 000 
decimal places (the one millionth digit is 9) and in 1999 it was calculated 
to 108 000 000 decimal places by P. Demichel and X. Gourdon! At the time 
of the paperback printing, the latest approximation is to 10 10 decimal places, 
recorded on 30 June 2008 by Shigeru Kondo and Steve Pagliarulo. Of course, 
such accuracy is far beyond anything that can conceivably prove ‘useful’, but 
that is not the point, an observation made in 1915 by James Glaisher (1848— 
1928) when he expressed the view: 

No doubt the desire to obtain the values of these quantities to a 
great many figures is also partly due to the fact that most of them 
are interesting in themselves; for e, n, y, In 2, and many other 
numerical quantities occupy a curious and some of them almost a 
mysterious, place in mathematics, so that there is a natural tendency 
to do what can be done towards their precise determination. 
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Gamma as a Fraction 


A man is like a fraction whose numerator is what he is and whose denominator 
is what he thinks of himself. The larger the denominator the smaller the fraction. 

Count Lev Nikolaevich Tolstoy (1828-1910) 


11.1 A Mystery 

It is a simple matter of arithmetic to use the decimal approximations of a number 
to generate fractional approximations of it. For example, 

y — 0.577 215 664901 532 860606 5 . . . 

results in the approximations: 

5 57 577 5772 57 721 _ 1 57 577 2881 57 721 

To’ Too' Tooo’ loooo’ 100000’"' _ 2’ Too’ Tooo’ 5000’ 100000’ "" 

Yet, compare the accuracy of the approximations with the mysterious sequence 

3 4 11 15 71 228 3035 
5’ 7’ T9’ 26’ T23’ 395’ 5258 

'2'y'i 007 

And what about 355355 ? These perplexing numbers are progressively more 
accurate approximations to y and better than any comparable fraction arising 
as above. If we do want to approximate y by fractions, we would do well to 
look to them. The question is, where do they come from? 

11.2 A Challenge 

Fermat was given to posing number-theoretic problems. The most famous of 
them is his ‘Last Theorem’ (so called because it is the last of his assertions to 
succumb to proof), but there were numerous others. Euler disposed of many of 
them and one in particular was partly solved by him in 1759 and completed by 
Joseph-Louis Lagrange (1736-1813) in 1768. It was half of a challenge thrown 
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to the European mathematical community by Fermat in January 1657 and read, 
‘Find a cube which, when increased by the sum of its proper divisors, becomes a 
square’ ; the other half was the same question with the words ‘square’ and ‘cube’ 
reversed. Bernard Frenicle de Bessy (1605-1675), an official at the French mint, 
a fine amateur mathematician and computor and correspondent of several of 
the great mathematical names of the time (particularly Fermat), provided four 
solutions to the first problem on the day he received it, and six more the day 
following. The challenge echoed across the English Channel to find the deaf 
ears of Wallis (who may well have been its main target) and the comment, 
‘Whatever the details of the matter, it finds me too absorbed by numerous 
occupations for me to be able to devote my attention to it immediately. . . ’. 
Undeterred, a second challenge followed the next month, part of which was to 
find an integer y which would make dy 2 + 1 a perfect square for any positive 
integer d, or failing that, to solve the two special cases d = 61, 109. Again, 
Frenicle de Bessy played his part by calculating the smallest solutions for all 
d < 150 and challenged others to at least solve the cases d = 150 and d — 313, 
hinting that the second example may be beyond anyone’s ability! Fermat fuelled 
the intellectual ferment with ‘We await these solutions, which, if England or 
Belgic or Celtic Gaul do not produce, then Narbonese Gaul will.’ (Narbonese 
Gaul was the area around Toulouse where Fermat lived.) Finally, rising to the 
bait, Wallis found particular solutions to both in very quick time and in doing 
so approached the solution of the ignored, initial challenge, as we show below. 
The challenges had generated interest in a problem that was 500 years older 
than Fermat and which became the subject of study and learned treatise by 
many, including the first president of The Royal Society, William Brouncker 
(16207-1684). 

If we consider the first challenge and make the reasonable assumption that 
Fermat had meant the cube to be that of a prime number, we require 1 + p + 
p 2 + p 2 — q 2 or (1 4- p){ 1 + p 2 ) — q 2 . Since 2 and only 2 (as the reader may 
wish to prove) is a factor of both brackets, the equation may be written as 

ab = {\q) 2 

with a and b co-prime. 

Since a and b have no common factors we can legitimately conclude that 
a = m 2 and b = n 2 for some integers m and n and therefore that 

1 + p — 2a — 2 m 2 and 1 + p 2 = 2b — 2 « 2 , 

so any such p must satisfy both the equations p = 2 m 2 — 1 and p 2 — 2 n 2 — 1. 
We are looking for primes of the form 2m 2 — 1 whose squares are of the form 
2 n 2 — 1, which looks as though it might be a big ask. 

With this analysis we can see that the two challenges are essentially the 
same. The second equation is the more demanding of the two and is a special 
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case of the problem: for a given non-square integer d find all x and y so that 
x 2 — dy 2 ± 1, which is known as Pell’s equation, after John Pell (161 1-1685), 
another of the Founder Members of The Royal Society. His 1659 translation 
into English of one Johann Rahn’s Teutsche Algebra brought to the English- 
speaking mathematical world the use of -F for division and it may have been 
Pell himself who originated this use of the notation (the ‘obelus' had been used 
for subtraction long before this). It was Euler who attributed Pell’s name to the 
equation, but it is generally considered to be a rather generous (or mistaken) 
honour. With the plus sign and d — 4 729 494 it can also feature in the solution of 
a surprisingly difficult problem regarding the size of a herd of cattle, which was 
purportedly set by Archimedes to Apollonius as another (possibly revengeful) 
intellectual challenge. Whether or not Archimedes originally formulated the 
problem as a challenge or otherwise, it appeared in The Sandreckoner, which 
we mentioned on p. 3. It has subsequently earned the name of ‘Archimedes’ 
Revenge’, as the herd turns out to have a size which has 206 545 digits. 

11.3 An Answer 

What has this to do with those mysterious fractions that approximate y (and 
any other number) so well? They are called the ‘convergents’ of what are known 
as ‘continued fractions’ (or, archaically, ‘anthyphairetic ratios’). Firstly, it was 
Wallis who coined the name (in the 1653 edition of his book Arithmetica Infini- 
te) rum)-, they have been studied by any number of mathematicians over the years, 
including the 6th-century Indian mathematician Aryabhata (in whose work they 
make their first appearance), Johann Lambert and Joseph-Louis Lagrange (who 
made significant contributions to the theory), Christian Huygens (who used 
them in his design of a mechanical model of the Solar System), Euler (who laid 
down much of the modern theory of them, and used them to prove that both e 
and e 2 are irrational) and Gauss (who explored many of their deep properties). 
Perhaps their heyday was in the 19th century but there is a current resurgence 
of interest in them, partly through their connection with chaos theory and com- 
puter algorithms and they do have their part to play in our story. We will only 
see a tiny part of the use of this comparatively overlooked area of mathematics, 
but enough to be clear that they are more important than they at first seem and 
less difficult to use than they first look. Firstly, their definition. 

A continued fraction is an expression of the form 

1 

a o H i , 

H j 

«2 H j 

(73 H ■ 

£?4 + ’ ’ ’ 
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where ciq is an integer (possibly negative or zero) and a\, cij, . . . , are non- 
zero positive integers; the expression could be finite or it could go on forever. 
Standard fractional notation is cumbersome and has given way to the alternative 
[ao\ a\ , 02 , ■ . .], in which the semi-colon separates the number’s integer from its 
fractional part and the commas separate what are known as its ‘partial quotients’ . 
For example, 


3 + 


= 3 + 


2 + 


2 + 


(¥) 


i 

46 
'■21 • 


— 3 H j- — 3 + . , 

2 4 - -1 

iT 21 V21 > 

„ 21 159 

+ 46 _ ~46 


or in a more compact notation, [3; 2, 5, 4] = If we build up the expression 
one term at a time, we get 



and 3 


38 

IT’ 


thereby generating the ‘convergents’ of the partial fraction. Put another way, 
is approximately 4 and also yj, with the latter the better approximation. 
Clearly, any finite continued fraction can be telescoped into an ordinary fraction 
in this way, with each of the convergents successively better approximants to that 
fraction. Converting an ordinary fraction to its continued form simply requires 
us to strip off the integer part, invert and repeat the process; for example, 


18 

13 


5 1 

1 — 1 -f- — — 

13 (») 


1 + 


1 


2 +l 


= 1 


1 


2 +1/(1) 


1 + 


= 1 + 


1 + I 


1 + !/(§) 


1 + 


2 + 


(1 + \) 


or [1; 2, 1, 1,2] and in the same way, j| is successively (and more accurately) 
approximated by |, | and 2 . This highlights a possible source of ambiguity, as 
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the i above could have been inverted to 2 and then split into two Is, but it is 
standard practice to agree that the fraction does not end with a 1 . 


11.4 T hree Results 

Continued fractions have many properties and are a fascinating subject in their 
own right, but at present we must resist the temptation to study them beyond 
mentioning the three properties of them that we will need, and even these we 
will not prove. 

1. Each convergent is automatically in its lowest terms. 

2. If p,,/qn are convergents that approximate an irrational number x and if 
q sj q n and if p/q ^ p n /qn , then \p n /q n — x\ < \p/q — x\ and, more 
strongly, \p n - q n x\ <\ p- qx\. 

This means that each convergent of a continued fraction is the best- 
possible fractional approximation to x with a denominator of its size 
or less. 

3. If x is an irrational number and a and b co-prime integers such that 


then ci jb is one of the convergents of the continued-fraction representation 
of x. 


11.5 Irrationals 


The process of converting an irrational number to a continued fraction simply 
requires the decimal expansion to be dealt with in much the same way as a 
rational number. For example, 


i x = 3 + 0.141 59- •• = 3 


1 


7.062513... 


— 3 + 


7 + 


1 


15.996 594. 


= 3- 


= 3 - 


7 + 


15 


1.003 417... 
1 


7 + 


15 


1 


292 + 0.654 . . . 
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which continues as 


7 r = [3; 7, 15, 1,292, 1, 1, 1,2, 1,3, 1, 14,2, 1, 1,2,2, 2,2, 1,84, ...], 


with initial convergents ’y > > n§ and -yyjgy • 

Of course, y is the approximation with which we are most familiar and, 
in what is possibly the first recorded attempt to approximate tc, Archimedes 
included in his work Measurement of a Circle the bounds yp < tt < y, which 
he found by inscribing and circumscribing a circle with regular polygons of 96 
sides. We know that y is universally accepted as the most convenient approxi- 
mation to use and with good reason, since we know from above that there is no 
fraction with a smaller denominator that is better. For the same reason, the more 
accurate yy is the best-possible rational approximant to tt with denominator 
^ 1 06. which says good things about the 1 6th-century European mathematicians 
who were known to use it and even better things about the Chinese mathemati- 
cian Tsu Chung-chih (a.d. 430-501), who described =f as an ‘inaccurate value’ 
and jy as the ‘accurate value’ of tt. Notice that the other Archimedean bound 
is not a convergent. 

(It is impossible to resist mentioning the nice result that 


L 


i 


■ 4 ( 1 -*)' 
+ x 2 


22 

y 


— TT, 


which can be proved by using polynomial division and term-by-term integration 
to arrive at the indefinite integral IjX 1 — |x 6 + x 5 — p* 3 + 4x — 4 tan -1 ;r.) 

The continued fraction for other numbers can be found in the same way. For 
example, ~J2 — [1; 2, 2, 2, 2, ... ] with convergents |, l, y . . . . 

The ‘Golden Ratio’ 


<p= i(l + x/5) = [l;l,l,l, 1,...], 

with convergents the Fibonacci numbers 

2 3 5 
1 ’ 2 ’ 3 ’ ' ' • ’ 

e = [2; 1, 2, 1, 1,4, 1, 1,6, 1, 1,8, 1, 1, 10, 1, 1, 12, ...], 


with convergents | | , y , y , , 

Notice how the continued-fraction representation of these numbers reveals an 
otherwise hidden pattern and one that makes them exceptional in an important 
and strange way, which we will discuss in Chapter 14. 

It is also true that 7r 4 = [97; 2, 2, 2, 2, 16539, 1, . . . ], which makes the fifth 
convergent a particularly accurate rational approximation to 7r 4 (and 

therefore its fourth root is a particularly accurate decimal approximation to 
n — differing in the 13th decimal place). 
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Those earlier fractional approximations to y come, of course, from its own 
continued fraction form of 


y = [0; 1, 1, 2, 1, 2, 1, 4, 3, 13, 5, 1, 1, 8, 1, 2, 4, 1, 1, 

40, 1, 11,3,7, 1,7, 1, 1,5, 1,49,4, 1,65, ...] 


with convergents 

1 3 4 11 15 71 228 3035 323 007 

2’ 5’ 7’ 19’ 26’ 123’ 395’ 5258’"” 559 595 

As an indication of the accuracy that is achieved. 


323 007 
559 595 


= 1.025 x 10“ 12 . 


Thomas Papanikolaou, who was mentioned earlier, also calculated the con- 
tinued fraction for y up to and including the 470 006th partial quotient and from 
this he could conclude that if y is rational, the denominator of the fraction must 
be greater than I0 242,IS(I . Of course, an infinite number of fractions with such 
denominators (and larger) do exist, but (unreliable) intuition moves us to think 
that a ‘naturally occurring’ number, such as y, simply would not behave in such 
an extreme way; to confound that view, someone needs to produce an accepted 
‘natural’ fraction with such a denominator! This aspect of y’s behaviour was 
touched on by the great German mathematician David Hilbert (1862-1943) in 
a seminal lecture given in 1900, which we will describe in more detail and from 
which we will quote at greater length later: 

Take any definite unsolved problem, such as the question as to the 
irrationality of the Euler-Mascheroni constant C, or the existence 
of an infinite number of prime numbers of the form 2" + 1 . How- 
ever unapproachable these problems may seem to us and however 
helpless we stand before them, we have, nevertheless, the firm con- 
viction that their solution must follow by a finite number of purely 
logical processes. 

The mathematical world still awaits the discovery of that particular ‘finite num- 
ber of purely logical processes’. 

The connection with Pell’s equation is profound. 


1 1 .6 Pell’s Equation Solved 

The solutions to Pell’s equation are hardly predictable: if we take it as a 2 — db 2 = 
1, then with d = 60 the smallest solution is a = 31, b — 4; with d — 62 it is 
a = 63, b = 8 yet with d = 61 it is a = 1 766 3 19 049, b = 226 153 980! 
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If a and b satisfy a 2 — db 2 = 1, they cannot possibly have any common 
factors. That said, the underlying pattern is revealed by the following argument: 

a 2 — db 2 = 1 <£> (a — bVd)(a + bVd ) = 1 \Td — — . 

b b(a + bVd ) 

The factorization makes clear that a > b\fd and so we can manufacture the 
inequality 

0 ^ yfd * ^~d 1 

< b < b(bVd + bVd ) ~~ 2 b 2 sfd ~ 2 b 2 ' 

Invoke the third property of continued fractions and we see that a lb must be 
a convergent of \fd, so the search for the solutions of Fermat’s problems should 
be among the continued-fraction expansions of the numbers defining them. For 
example, with Fermat’s first problem on p. 92, the first solution is p — 1 and 
n = 5 to give p — 1 and q = 20 as the smallest solution. 

1 1.7 Filling the Gaps 

Continued fractions are the first choice among many possibilities for rational 
approximation, but they do leave plenty of gaps in the list of best possible 
approximants. With y we have seen that we have the consecutive continued 
fraction convergents of 

1 3 4 11 15 71 228 3035 

2’ 5’ 7’ 19’ 26’ 123’ 395’ 5258’ "" 

but if we set a computer to search for the best-possible rational approximations 
up to and including any given denominator, we get 

1 3 4 11 15 41 56 71 157 228 

2’ 5’ 7’ 19’ 26’ 71’ 97’ 123’ 272’ 395”" 

and the next interval is filled with 

228 1667 1895 2123 2351 2579 2807 3035 
395’ 2888’ 3283’ 3678’ 4073’ 4468’ 4863’ 5258”" 
and, of course, the gaps get bigger and so does the list of fractions to fill them. 

In short, continued fractions provide a nice, methodical method of rational 
approximation and they are extremely useful in general theory, but they do not 
tell the whole story; very few things do. 

11.8 The Harmonic Alternative 

We will introduce (without pursuing the idea far) just one alternative method 
of fractional approximation, mainly because it encourages deeper thought into 
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our base 10 system and also it provides a first example of the usefulness of the 
terms of the harmonic series. It has to be said that the method does not find the 
best fractions, but it is nonetheless novel and worthy of study. 

We are really interested in the fractional part of a number and so we will 
choose to divide a decimal fraction into its whole number part, considered as a 
single number, and its fractional part, divided into its components. For example, 
the expression 62.372 58 is a shorthand for 


11111 

62 H x 3 H » x 7 H T x 2 H 7 x 5 H t x 8, 

10 10 2 10 3 10 4 10 5 


which can be written in the rather more complicated form 


10 


62+ — 3+ — 7+ — 2 


10 


10 


1 

To 




and of course such expressions could be extended indefinitely, as the number 
dictates. The 3, 7, 2, 5, 8 are simply a special case of any sequence of non- 
negative integers which are each less than 10 and we could adopt notation 
similar to that for continued fractions, writing the number as [62; 3, 7, 2, 5, 8], 
More generally. 


[n; a, b, c , . . . ] — n + 




1 

To 



where n is the whole number part and a,b,c, ... form the fractional part and 
so are non-negative integers +9. 

So far this is doing no more than looking at the obvious in a different way 
and playing with notation, but the expanded form of the expression, with its 
repeated ji, suggests that we could alter that number to a different one to 
achieve a representation in another base (the a,b,c, ... would naturally be 
restricted to be less than that base). This is nothing new. Replace A by T and 
we have the binary system of 0s and Is, with ^ the tertiary system, etc. More 
interesting still, what if we mix the bases and represent the number in a mixed- 
base system — using the terms of the harmonic series? This would mean that 
our number would be written in the form 



and rational approximations to it would be any first part of this, where a < 2, 
b < 3, c < 4, 

A closer look at the form of this representation reveals that, rather than writing 
the number as 

1 1 

n -\ a H 7 b + ■ ■ ■ with a.b, ■ ■ ■ <9, 
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we are writing it as 


1 1 1 

n H a H b H c - 

2! 3! 4! 

If we start with tc , we get 


with a < 2, b < 3, c < 4, 


1 


1 


1 


1 


tr — 3+- 0+- 0+- 3+- l + - 5+-(6+-(5 


I 


1 


or in the more compact notation 

7t = [3; 0,0, 3, 1,5,6, 5, ...] 
to give the fractional approximations 

25 47 2261 15 833 42223 11400 211 
’ ~8~’ 15’ "720~’ 5040 ’ 13 440’ 3 628 800' 

Since the Taylor expansion of e x is 

2 3 4 

, X X 3 X 

1+ - T+ ¥ + 3! + 4! + '"’ 
putting x = 1 gives e as 


1 


1 


1 


1+1 + ;t1 + t1 + t1 + t1+t1 + -1 + x1 + -- 


or the very nice e — [2 ; 1 , 1,1, 1,1, 1,1,...] to give fractional approximations 

5 8 65 163 1957 6855 109 601 
2’ 3’ 24’ ~60~’ ~720~’ ^52"’ 40320 ’ " " 

Finally, with y we have 



or in the shorthand notation [0; 0, 1,0, 1,4, 1,4, 1,3,0, . . . ] and the rational 
approximations 


1 1 13 23 83 2909 23 273 3491 3491 

2’ 2’ 24’ 40’ 144' 5040' 40 320’ 6048’ 6048 

These various approximations are not at all bad, as the reader can measure. 
Notice that with the possibility of a zero, consecutive approximations can be 
the same. We will soon be looking at a variety of other ways in which the 
harmonic series appears. 
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Where Is Gamma? 


One cannot escape the feeling that these mathematical formulas have an inde- 
pendent existence and an intelligence of their own, that they are wiser than we 
are, wiser even than their discoverers, that we get more out of them than was 
originally put into them. 

Heinrich Hertz (1857-1894) 

Gamma’ s definition y — \\m n ^ 00 (H n — In n), when rewritten as the asymptotic 
approximation H n ~ In n + y, provides a simple (and accurate) method for 
approximating the partial sums of the harmonic series. The lack of an explicit 
formula for H n together with its glacially slow divergence makes the approxi- 
mation all the more important and with that approximation we have an inevitable 
appearance of y; already we have seen the estimate used on a number of occa- 
sions. Its connection with the Gamma function guarantees y’s role in analysis 
and the Gamma function’s connection with the Zeta functions guarantees y’s 
role in number theory. The number is inevitably, intrinsically (and frequently, 
intricately) involved in mathematics, reluctant though it is to show itself in ele- 
mentary areas of the subject. It would be easy to relegate this chapter to a long 
list of integrals, sums, products and limits which involve y but instead we will 
give a representative few and leave it to the interested reader to seek out more; 
in doing this we will be paying no more than lip-service to that ‘serious consid- 
eration’ of which it is worthy. To begin with we will look at another example 
of y allowing the harmonic series to be replaced by logarithms, this time not 
as an estimate but as the exact limit. 

12.1 The Alternating Harmonic Series Revisited 

The name Riemann has already appeared several times, attached to the word 
Hypothesis. It is not yet time to consider either the man or the problem but we 
can now mention a peculiar result of his regarding the convergence of series 
and its novel implications for y (and any other number). 

The (geometric) series 1 — j + \ — g • • • converges to i and the series of 
positive terms associated with itl+j + ^ + g- --to 2. Yet, it is not always the 
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case that sacrificing the cancellation brought about by omitting the minus signs 
has such innocent consequences, and the harmonic series is a particular case in 
point: we know that the alternating harmonic series converges (to In 2) and that 
the harmonic series diverges. This phenomenon is encapsulated in the concept 
of ‘conditional convergence’, with the alternating harmonic series conditionally 
convergent and that alternating geometric series above ‘absolutely convergent’. 
Conditionally convergent series are delicate, as we can see from 



and the alternating harmonic series now converges to half of itself! 

Riemann’s peculiar result is that any conditionally convergent series can be 
made to sum to any number at all ! For example, if we wish an arrangement of 
the alternating harmonic series to sum to the Golden Ratio q> = j(l + V5), 
that arrangement begins 


V = 



11111111 

2 + 9 + TT + l3 + l5 + l7 + T9 _ 4 + ’” 
1111111 
+ 2l + 23 + 25 + 27 + 29 + 3l - 6 + 


We can manufacture an arrangement to sum to any given number / by adding 
in as many of the positive, odd terms as are needed to make the sum exceed 
/, bring in the negative even terms to bring the sum below l, and continue in 
this see-saw way for as long as we please; the divergence of each of the two 
subseries guarantees that we will always be able to do this. 

There are general results associated with this phenomenon, the proof of one 
of which naturally brings in (and takes out) y and to look at it we need the 
concept of a ‘simple’ arrangement of the alternating harmonic series. Such an 
arrangement is defined to be any in which the terms of the two subsequences of 
positive and of negative terms appear in descending order. For example, the re- 
arrangements which led to ^ In 2 and to q> are both simple, yet the rearrangement 

1 + 2 - 3 + 5 - 5 + \ is not 
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With this definition in place, let p n be the number of positive and q n the 
number of negative terms in the first n terms of a simply rearranged alternating 
harmonic series, then the result to interest us is that the rearrangement converges 
if and only if 

r Pn 
a = lim — 

n >-oo q n 


exists, in which case the sum is In 2 + \ In a. Above we have a — ^ and so the 
sum is In 2 + ^ In j = ^ In 2. 

To establish the result, write the sum of the first n terms of the series as 

EL I a k > then 


Pn 


1 q " 1 

y^ ak — y y — . 

^ ^ 2k - 1 ^ 2k 

k = 1 k= 1 k=l 


But 


Pn . 2pn , Pn . 

y — = y--y~ 

^ 2k - 1 ^ k ^ 2k 

k= 1 k= 1 k= 1 


and so 


n 2p n Pn . q n * 

y — y — y^ — y^ — 

^ ^ k / —2k ^ 2k 

k= 1 *=1 ’ ' 


k= 1 


£=1 


= ^ ^ - !//,„ 


= (In 2/?„ - /2p„) - y^Pn — Yp„) — 5( ln <?» - Yq n ) 
= In 2 + \ In ) - V 2 Pn + \y Pn + Wq n , 

\ Qn / 


where the y„ are the approximations to y to that number of terms. 
Therefore, 


n 

lim y = In 2 + i In 

n— >oo 2 — 4 
k= 1 



= In 2 + j In a 


-y + \y + \y 


and we are done. 

If, for example, we wish to write In 3 in terms of the alternating harmonic 
series it must be that 

In 3 = In 2 + ^ In a, 
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which makes a = |, and indeed the representation is 

/ 1 1 1 1 1 1 1 

V 3 2 5 7 4 9 11 

11 111 1 \ 

_ 6 + l3 + 15 _ 8 + l7 + 19j 
/ 1 1 1 1 1 1 

V + 2T — T0 + 23 + 25 _ l2 + 27 

1 1 1 1 1 1 1 \ 

+ 29 _ 14 + Ti" + 33 _ T6 + 35 + 37/ 



where the bracketing groups equal numbers of the terms by matching patterns 
of the signs. Each group comprises nine + signs and four — signs and with this 
pattern repeated throughout the expansion we will have a = as required. 

Of course a is the limit of p n /q n and we cannot expect in general the limit to 
reveal itself by simple repetition. Recall that Euler had hoped that y might be 
the logarithm of some important number. If that is the case, it would be possible 
to write 

y = In 2 + 1 In a, 

where the a is the ‘important’ limit of the ratio of the + and — signs in its 
representation in terms of the alternating harmonic series, which starts 



i r / i i i i i i i i i \ 

+ 3lV4~6 + 5 - 8 + 7 - T0 + 9 - l2 + TTy 

/ 1 1 1 1 1 1 1 1 1 \ 

VT4 _ T6 + l3^T8 + T5 _ 20 + T7 _ 22 + T9y 
/ 1 1 1 1 1 1 1 1 1 \ 

V24 - 26 + 21~28 + 23 - 30 + 25 _ 32 + 27y 
/ 1 1 1 1 1 1 1 1 1 \ 

VS - 36 + 29 - 38 + 3T - 40 + 33 - 42 + 35y 
/ 1 1 1 1 1 1 1 1 1 \ 

i v ~44"46 + 37"48 + 39 _ 50 + 41 _ 52 + 43j 
/ 1 1 1 1 1 1 1 \ 1 

V54 _ 56 + 45 _ 58 + 47 _ 60 + 49yj 
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Within the braces there are five round brackets, each containing nine terms with 
identical sign patterns and, at the end, one bracket of seven terms. If this repeat 
is continued, we would have 23 + signs and 29 — signs in the repeated cycle 
of 52 terms; this would mean that a — and y = In 2 + ^ In and we will 
have outdone the great Euler! His constant is then 



Unfortunately, it isn’t, since this evaluates to 0.577 246 That cycle was too 

much to hope for and the pattern breaks at around the 550th term. Is there a pat- 
tern with a longer repetition? Who knows? If y = In 2 + \ In a, then a = | e 2y 
and the convergents of the continued fraction of this number are 

3 4 19 23 548 571 1119 2809 6737 63 442 450 831 
4' 5’ 24’ 29’ 69l’ 720’ MU’ 3542’ 8495' 79 997 ’ 568 474’ 

and at least we have hit on one of them, with our making an appearance. 


12.2 In Analysis 


One of the (many) problems with integration is that we cannot always integrate a 
function in ‘closed form’ ; that is, no finite combination of the usual functions of 
mathematics will combine to be the anti-derivative of the function, and there is 
often only a slight change needed to convert possible to impossible, or the other 
way around. For example. In u, u In u, (In u)/u, 1 /{it In u) are all straightforward 
to integrate, yet 1/lnw, u/lnu are simply not possible. The irksome thing is 
that some of these ‘difficult’ integrals occur with great frequency and in many 
important applications, so much so that they lose their anonymity and are given 
names. For example, 


/ sin u f cos u f 

du, / du, 

u J u J 

are all impossible in closed form and 

2 


erf (a) = 




j 


e du 


Li (x) — 


Ci(x) 



I 

du 

In u 

cos u 

du 

u 



give rise to the functions: 

(the error function), 

(the logarithmic integral), 
(the cosine integral), 


du 
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Si(x) — 


Ei(x) — 



sin u 

d u 

u 


d/< 


(the sine integral), 

(the exponential integral). 


They each appear in their different ways and places. 

Laplace’s error function, erf(x), is easily recognized as essentially the prob- 
ability density function of the Normal distribution (the constant is needed to 
make the total area 1). 

Li (x) appears regularly in number theory in estimates of asymptotic values, 
including a conjecture of Littlewood and Hardy concerning the Goldbach Con- 
jecture (mentioned in a few lines). It will later become the central focus of our 
attention when it appears as Gauss’s estimate of the prime counting function 
n(x) — Euler had, of course, already considered the function (in 1768). Subse- 
quently it appears in the work of Mascheroni (1790) and Caluso (1805) but it 
came to prominence (and was given its name) after it was the object of study in 
Soldner’s Theory of a New Transcendental Function of 1809 (admittedly with 
the alternative lower limit of 0). We mentioned this work on p. 89. It was in that 
paper that Soldner gave that corrected value of y and also the series expansion 
of Li ( x ) as 

A In'' x 

Li(x) = y + lnlnx + > . 

' rr\ 
r = 1 

Ci (x) has the similar form 


OO 

Ci(x ) = — y — lnx — ^ 

r= 1 


(~x 2 y 

2r (2r ) ! 


but Si (x) involves neither In nor y in its expansion of 


5i(x) = ^(-l) r - 1 

r= 1 


x 2r 1 

(2r - l)(2r - 1)!' 


This last trio work hand-in-hand in many applications and in widely diverse 
areas of mathematics, including quantum field theory, electromagnetic theory, 
semiconductor physics, and analysis of the Gibbs phenomena of Fourier analy- 
sis (the misbehaved bits at the fly-back points). 

Ei(x) is important partly because the integral of any function of the form 
R (x)e x , where R(x) is a rational function, can be shown to reduce to elementary 
integrals and Ei{x). 

y also appears in what are known as ‘modified Bessel functions of the second 
kind’, named after the German astronomer F. W. Bessel (1784-1846), although 
they were studied earlier by yet another Bernoulli (Daniel) (1700-1782) and. 
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inevitably, Euler. These functions appear among the solutions of what is known 
as the Bessel Equation, 


9 d 2 v dv t 

* 2 -r4 +*/ + (* 

dx A d.r 


cr)y = 0, 


where a ^ 0 is constant. It arises in the study of problems concerning vibrations 
of membranes, heat flow in cylinders and the propagation of electric currents in 
cylindrical conductors — and some of the problems of analytic number theory. 

Other, nameless integrals and limits involving y are easy to find. Recall from 
p. 58 that y — — r'(\ ) and we have, using one of the definitions of the Gamma 
function (and differentiating under the integral sign). 


r(x) = 



du = 



£ (x-l)ln u e u 


dll 


SO 


-L 


r\x)= I u x l e 11 In u du and r'( 1) = / e "lni/dn 


'(D= r 

Jo 


which makes 


-f 


e “ In u du . 


Increasing the level of ingenuity develops this into a more exotic result: 

*00 /*! POO 

E ,, A,. _ I i / „~U 


POO p 1 p 

= / e~ u In u du — / e~“ In u du + / e~ u In u du . 

J o Jo J l 


Now we integrate by parts in each case, the second integral perfectly straight- 
forwardly, integrating e~ u to — e~ u , but the first by using the underhand trick 
of integrating e~ u to — e~“ + 1 to get 


— Y = [(- e~“ + 1) In w]q 

f 1 (-e~“ + 1) 

Jo u 


du + [— e "In u] 


f c 

00 / 

1 “it 


d« . 


The two evaluated components are both 0, with the exponential drowning the 
logarithm, and so 


-Y 


-L 


1 (-e~ u + 1) 


du 


[ 


d u 


and 


-L 


1 (1 - e~ u ) 


dw 


o u 


i; 


du. 
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Finally, if we make the substitution u — 1/f in the second integral and swap 
back variables to u we get 

r i(i_ e -») r 1 g- 1 /" pi 1 _ e -u _ e -\/u 

y= I du — / du — / du. 

Jo u J 0 u Jo u 

This is not only a fearsome integral conquered but also an integral definition of 
Y over a finite interval, which can be used to calculate its value, provided that for 
reasons of continuity we agree to define the integrand to be 1 at u — 0 (graphing 
the function is a good test for any graph plotter and numerically approximating 
the area an even better one). 

With this result we can derive the series expansion for the Ei (x) function, 
using a method very similar to Soldner’s for Li (x ) . 

Assume x ^ 1 (the ideas still work if x < 1), then 


r°° g-« r°° e ~u r 

Ei(x) — / du — / du — 

Jx « J l « J 1 

= d„-f 

J 1 U Jx 

-L 


du 


— du 
u u 


■ du 


■ du 


■ du 


11 - 1 f x 1 

du — / - 

U J 1 u 

r^du-f 

Jo K Jo 


f x e~ u -l r l 

/ du + 

Jo « Jo 


/■OO 

J 1 

/•OO 

J 1 

/•OO 

J 1 

— I du + / du — / 

Jl M JO H JO 

/»X 00 

= -y- E(- 

^ r=t 

°° 1 /*A 

= -y - V (-1)' - / u' _1 du — 
oo 1 

I 


— dw 
u 

l « 


0 M 

1 <?-" - 1 




M JO u 

r l,r ~ { 

1) du — In a 

r! 

In x 


1 \ 

- du 

-T- 

J 

Jl U 

du — 

r i 

/ — du 


Jl u 

du — 

f x 1 

/ — du 


Jl u 


r=l 


— lnx 


= -y-J 2 ~ ln v = -K - In* - E 


r=l 


(-A-)'- 


r=l 


We have used the standard Taylor expansion of e u to deal with the third integral. 
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There are a countless number of other integrals, sums and products in which 
Y is involved and below we list a few more examples: 



The two integrals evaluating to expressions involving ic display nice relation- 
ships between it, e and y . The two product forms were both arrived at in 1 874 by 
Franz Mertens (1840-1927) and we will have use of one of them in Chapter 15. 
The p that appears is prime and the first form can be developed to the very nice 



which is reminiscent of Gamma’s definition, but using primes only. 

The second summation result involves the Von Mangoldt function 

I In p, r — /?"' . p prime, 

0, otherwise, 

which will come to our closer attention again in Chapter 16. 

Each expression in the list is established in its own way and we will content 
ourselves with proving just two of them: the one involving the Floor function 
and the other the Zeta function, and so keep an earlier promise, made on p. 52. 


Firstly we will deal with 



* Sondow 0998). 
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where the notation {a } is used for the fractional part of x and is therefore related 
to the Floor function by {x} = x — |_xj . It seems impossible ever to arrive at an 
exact answer for such a strange integral, but we will see that y naturally makes 
an appearance which solves the problem. 

To begin with, by definition of the Floor function, 



We need to rewrite this expression and to do so imagine the interval divided 
into unit sub-intervals with the right-hand-side end point excluded, then 



Now consider the expression 



In this sum the interval [1, 2) is covered just once, the interval [2, 3) is covered 
twice, the interval [3, 4) is covered three times, . . . the interval [n — 1, n) is 
covered n — 1 times, and that is precisely what Equation (12.1) is saying. The 
two are the same. Therefore, 
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So, 


This means that 


*._i + jrw 4t _ 1+ jr 


x — {a } 


dx 


r x {a} r 1 

= 1 + / - -4 dx = 1 + / - 

J 1 * Jl * 


A {a} 

2 ^ 

A z A- 

1 + In n — / ~ — = dA . 


M 

y dA 

A A z 


-JT 


y = lim (//„ — Inn) = 


-r 


w 


i 


w 


— dA and / — dA = 1 — y, 


as required. 

With Euler (and us) failing to identify y in terms of the logarithm of an 
important number, we mentioned on p. 52 that he provided a number of formulae 
for its evaluation, one of which was 


UU | 

V T (C(0-1)= 1-y, 

' i 
i= 2 

which he used to calculate the value to five decimal places. We will now derive 
his expression: 


In 


r - 1 

1 


y = Inn ( Y^ In n ) = Inn ( Y^ Y^ 

n — >oo \ ^ ' r ) n — >oo \ ' r ' 

V=1 7 V=1 r = 2 

- + -(tG-K^r)))- + SU +1 ” 

- + EOM))- + E(YE^ 


r= 2 
OO / oo 


r= 2 


r - 1 


r z — ' ir' 

i=l 


= ‘-E E^ ='-E Ep 

r— 2 \ = 2 7 i— 2 V=2 

OO / 1 OO , \ oo . 

= ‘-E ‘E? =i-E>-» 


i=2 v r=2 


i=2 


and the result clearly follows. 

Yet again we have used the expansion of ln( 1 — a) to eliminate the logarithm. 


Ill 



CHAPTER 12 


4 



In n + 2 y - 1 


3 


2 
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n 


Figure 12.1. Dirichlet’s result. 


Following Euler’s path a little further, we need 15 terms of the series to 
achieve five decimal places of accuracy, and so 


Y * 1 - i(f(2) - 1) - 5 (4T (3) - 1) - ±(£(4) - 1) 

- i(4r(5) - 1) 33 (4T (15) — 1) 

= 1 - (0.322 467 + 0.067 352 3 + 0.020 580 8 + ■ ■ • + 0.000 002 039 22) 


= 0.577 217..., 

which is perfectly easy to calculate with a modern computer running state-of- 
the-art software. . . 

12.3 In Number Theory 

Although y ’s appearance in number theory is no matter for surprise, the manner 
of its appearance can be puzzling. We will list just a few ways in which it 
emerges. 

• In 1838 Lejeune Dirichlet (1805-1859) proved that 



r= 1 


the average number of divisors of all integers from 1 to n, approaches 
Inn + 2y — 1 as n increases (see Figure 12.1). 

Further along the line. 



1000 


r = 1 
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0 20 40 60 

Figure 12.2. De la Vallee Poussin’s result. 


80 100 
n 


evaluates to 7.069, and In 1000 + 2y — 1 = 7.062 19 ... . 

• Equally baffling, in 1898 Charles de la Vallee Poussin (1866-1962) (more 
of him later) proved that if we divide an integer n by all integers less than 
it and average the deficits of each quotient to the integer above it, the 
answer approaches y as n -> oo. This time the calculation is 


1 

n 




with the graph shown Figure 12.2. 

And, again, further down the line we have 


1 

10000 



10 000 ' 
r 


10000 


which evaluates to 0.577 216 

Incredibly, the result remains true if the divisors are those in any arith- 
metic sequence or if they are only the prime divisors. 

• y also appears (rather messily) in three standard asymptotic measures of 
the efficiency of the Euclidean Algorithm. In each case it appears because 
of the implicit appearance of the Glaisher-Kinkelin constant, mentioned 
on p. 88, and the explicit appearance of Porter’s constant, which is the 
impressive 

6 In 2 / 24 , \ 

— 2-(31n2 + 4 y - -^'(2) - 2J = 1.467 07.... 
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On p. 34 we mentioned Euler’s result regarding the possibility of repre- 
senting an integer as the sum of two squares. If it is possible to do so, 
asymptotic estimates of the number of possible ways involve the Sier- 
pinski constant (2.584 981 7. . . ), which itself involves y — and which is 
rather too cumbersome to define ! 

To understand the last example it is necessary to appreciate a convergence 
that might be extracted from a divergent sequence. If a bounded infinite 
sequence {a,,} converges to a limit /, any infinite subsequence will con- 
verge to that same limit. That is perfectly reasonable. Now suppose that 
the sequence does not converge and that we consider the set of limits L of 
all infinite subsequence that do converge (a technical result known as the 
Bolzano-Weierstrass Theorem ensures that there is at least one such), 
then L has a maximum and a minimum. If we write the maximum as 
l~ and the minimum as /_, these are called superior and inferior limits, 
respectively, and they are written as 

l~ — lim sup a, , and /_ = liminfa,,. 

>oo n— >oo 

For example, the oscillating sequence —1,1,— 1,1,— 1,1,... clearly 
does not converge but has the two convergent subsequences 1, 1, 1, 1, 1, 
1 , . . . and — 1,— 1,— 1,— 1,... with limits of 1 and — 1 , of course. This 
means that 


lim sup a n = 1 and lim inf a n — — 1 . 

n— >oo /!— »oo 

A little more subtly, 

1 1 2 1 2 3 1 2 3 4 1 
2’ 3’ 3’ 4’ 4’ 4’ 5’ 5’ 5’ 5’ 6”" 

does not converge but the subsequences j, j, ^ , 5 , . . . and j, |, |, 
. . . converge to 0 and 1 , respectively, and so 

lim sup a n = 1 and lim inf a n — 0 . 


With these ideas in place, we can at once mention an important idea in the 
study of primes, list a truly impressive-looking formula, mention Erdos 
once again, give another example of y appearing and reveal a (typically 
poor) mathematical joke. The length of the interval between consecutive 
primes, p„+i — p n , is of clear importance and one of the consequences 
of the Prime Number Theorem (which we are inexorably approaching) 
is that, on average, p n +\ — p n is about In p n . That said, the average in no 
way typifies the sequence’s behaviour, as p n + 1 — p n oscillates wildly and 
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is very much a contender for lim sup,,^^ and lim investigation. 

The latter is the more problematic, as it is not even known if 

liminf(/?, !+ i — p n ) < oo, 

>oo 

although Erdos (and others) have made some progress with this. It is the 
lim sup^^ that provides our stupendous formula, which is a 1990 result 
of Maier and Pommerance, following a 1935 result of Erdos and also a 
number of others in between, 

,. (Pn+\ ~ /?„) (log log log/?,,) 2 4e y 

lim sup ^ , 

n — >-oo (log ) (log log/?,,) (log log log log /?„) C 

where c = 3 + e~ c . Any comment would seem superfluous. It is the 
natural logarithm, but to use In would be to deny the opportunity to 
mention the joke: what noise does a drowning Analytic Number Theorist 
make? Log. . . log. . . log. . . log. . . 

With this idea in place, we have finally the wildly divergent sequence 
generated by Euler’s curiously named Totient function (pin) (presum- 
ably from the Latin ‘tot’, which means ‘so much’), which is defined to 
be the number of positive integers not greater than n and co-prime to 
n. It finds extensive use in very many number-theoretic investigations. 
Edmund Landau (1877-1938) proved that 


but that 


lim sup (pin) — oo 

n — > oo 

(pin) In Inn 

lim inf = e 


H-S-OO n 

He also proved that for N large, 

N 


1 

V PS Ain N + B, 


where A is the elegant 

mm 

C(6) 

and B is distinctly inelegant but its expression contains n, f (3) and y . 

As an example of the elegance and usefulness of the Totient function, the 
reader should be aware that it might help with a route to mathematical immor- 
tality in that if the Goldbach Conjecture is true (every even number greater than 
2 is the sum of two primes), then for all positive integers n there are primes p, 
q such that < pip) + (piq) = 2 n. 
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<p(n) 

80 



n 

Figure 12.3. Euler's Totient function. 


12.4 In Conjecture 

• Suppose that we toss a fair coin indefinitely and record the sequence 
of heads and tails. Now we choose an integer n and list all 2" possible 
sequences of heads and tails. How many times will we have to toss the 
coin in order to see each of our sequences appear? It is known that the 
minimum possible number of tosses is 2" + n — 1 and it is conjectured 
that the average number approaches 2" (y + n In 2) for large n. 

• A second conjecture concerns Mersenne primes, which are primes of the 
form 2 P — 1 , where p is prime (a natural hunting ground for big primes). 
It has been conjectured that if M(x ) is the number of primes p ^ x for 
which 2^ — 1 is prime then M(x) ~ klnx, where k = e y /\/2. Since 
there are only 42 known Mersenne primes (as this paperback edition goes 
to press), the evidence has to be considered a touch scanty. 


12.5 In Generalization 

Carl Gustav Jacobi (1804-1851) is quoted as saying, ‘One should always gen- 
eralize’, and such a view is very much part of mathematical philosophy, but 
there are often several directions in which the generalization could be made. So 
it is with y . 

• We could move into two dimensions — but how? We will describe one 
way, which leads to the Masser-Gramain constant and requires a different 
approach to the harmonic series. Take the real line and the positive integer 
points on it and select the origin as a fixed reference point, then the interval 
[0, 1], of length 1, is the smallest interval containing the integer 1; the 
interval [0, 2], of length 2, is the smallest interval containing the integer 
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2, etc., and we can think of the expression 



as 



1 

length of interval [0, r] 



Having generalized ‘interval’, we now need to generalize ‘integer’. For 
300 years, Fermat’s Last Theorem had been a ‘primordial soup’ from 
which vast and vastly important areas of mathematics have been devel- 
oped. We have already mentioned its connection with the Bernoulli Num- 
bers. With Andrew Wiles (born 1953) finally settling the matter, it may 
be that it has given its final ‘birth’, but in the mathematical ferment of the 
19th century it brought about the development of numbers of the form 
a + by/— 1, where a and b are rational numbers and, when both are inte- 
gers, numbers called Gaussian integers (named after . . . ). Now we can 
move to two dimensions and from K to C to define the exotic 


8 = 



1 

7t(Pr) 2 



the Masser-Gramain constant (we need to start at 2 to make the defini- 
tion sensible). The denominators are the two-dimensional equivalent of 
interval length: the areas of circles and the p r are defined by 


p r = min{p : there is a closed disc with radius p containing 

at least r distinct Gaussian integers}. 


Perhaps it is not surprising that the exact value of the constant isn’t known ! 


Euler (naturally) embraced the idea of generalization and did so by con- 
sidering 



as 

^ /( r ) - /M dx ) 


with f(x) — 1 / x as just one particular positive, decreasing function. 
From this he generalized to 


1 

f(x)= — , where 0 < a < 1, 
x a 
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to produce two divergent components that combine to converge to finite 
sums — known as Euler’s generalized constants. 

With f(x) — ln" ! x/x, with m a positive integer, we have a final family 
of generalizations to which we will refer, known as Stieltjes constants 
y m , about which not so very much is known. Their definition is 


Ym — 





In'" r 
r 

In'" r 
r 

In'" r 
r 



"ln" ,+1 jcl” 

. m + 1 J , 


ln'” +1 n 

m + 1 


Of course. 



= Y- 


These are of particular importance because of their appearance in the 
series expansion of the complex form of the Zeta function (it is called the 
Laurent expansion, which is discussed in Appendix D). To be exact. 


?(z) = 



+ E 


r= 0 


(-iy 

r! 


Yr(z - 1)' . 


There are other generalizations (including a lattice sum form of immense 
complexity) but we hope that by now the point is made that Euler’s sim- 
ple and natural original definition can lead to interesting and sometimes 
important extensions. To paraphrase Andrew Wiles, ‘we think we will 
stop here’. 
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It’s a Harmonic World 


I tell them that if they will occupy themselves with the study of mathematics they 
will find in it the best remedy against the lusts of the flesh. 

Thomas Mann (1875-1955) 

We will now take a brief look at several of the ways in which H„ appears, and 
the pattern of numbers 1 , j, |, . . . forming its terms appear, in some areas of 
considerable diversity. The selection is by no means comprehensive and each 
initiative can be developed (in some cases very considerably) beyond where we 
leave it, but to delve deeper or to embrace more widely would engulf more pages 
than this book could afford. Firstly, though, we ought to address the question 
of the name ‘harmonic’. 


13.1 Ways of Means 


With two numbers a and b, if one had to write down three examples of an 
average of two numbers a and b, it is likely that they would be (in order) their 
arithmetic, geometric and harmonic (or subcontrary) means, defined by 

A=^(a + b), G = \fab, H = — 2 — , 

i/a + l/b 

respectively, and there is a nice order to them and relationship between them. 

The Babylonian identity ab = ^({a + b ) 2 — (a — b ) 2 ), which we mentioned 
on p. 1 , can be rewritten as 


( a + b 

v~ 


= ab + 


a — b 


which tells us that 
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and therefore that A ^ G. It is also perfectly clear that both A and G lie between 
a and b. Also, 

ab G 2 

H = = , 

\{a + b) A 

which gives us the pretty G — V AH and the order H ^ G ^ A. Chasing 
inequalities easily shows that H is greater than the smaller of a and b and so 
all three means are nicely ordered within the interval, which is reasonable — but 
there are other definitions of means. Although our interest is really with H, it 
would be a shame to omit at least some mention of where it and the other two 
definitions fit in to the greater scheme of things. 

The famous theorem that bears his name evidences a tiny part of what 
Pythagoras of Samos ( ca . 569 -ca. 475 b.c.) helped to bring to the world, a 
sentiment agreed by Bertrand Russell (1872-1970), who said, 

It is to this gentleman that we owe pure mathematics. The contem- 
plative ideal, since it led to pure mathematics, was the source of a 
useful activity. This increased its prestige and gave it a success in 
theology, in ethics, and in philosophy. 

The distinction between what Pythagoras himself discovered and what was 
discovered by his clandestine society is impossible to make, so secretive were 
they, but it is clear that he knew of the three means that we have mentioned. It 
is also clear that later Pythagoreans defined at least seven more as part of the 
following general schema. 

Given two numbers a and c, define a number h to he a ‘mean’ of the other 
two, such that a ^ b ^ c. If this inequality holds, then b — a, c — b and c — a 
are all (A) and the Pythagoreans investigated the idea of comparing the ratios of 
pairs of these differences with the (not necessarily distinct) ratios of the original 
numbers. For example, if we take the ratio 

b — a a b c 

c — b a b c 

we will arrive at b = \ (a + c), or A. Alternatively, we could take 

c — b b c 
b — a a b 

to get h = ~Jac, or G. The harmonic mean H emerges from 

c — b c 

b — a a 

Playing with the possibilities, as no doubt the Pythagoreans did, results in 
several novel definitions of mean, for example 

c — b a 

b — a c 
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reduces to the elegant symmetric mean 


5 = 


a 2 + c 2 
a + c 


whereas 



a 

b 


produces the distinctly inelegant and unsymmetrical 


c — a + a 2 — 2 ac + 5c 2 


This last definition recovers a little of its dignity by giving the mean of 1 and 
2 as the Golden Ratio <p = 1.618 033 9 ... . The reader may wish to investigate 
the other alternatives, some of which collapse, while others are as strange as 
the example above. All but the first three of the definitions have disappeared 
through the millennia, but a definition of mean which is important to this day 
is missing — the ‘root mean square’ 



but you can check that this can be recovered as V AS or A = V(R 2 + G 2 )/ 2. 

Generalizations of the definitions of arithmetic, geometric and harmonic 
means to n numbers are obvious and we will have need of them in later chapters. 

In fact, generalizations of the definition of means exist in the modern day, 
notably with 


• Holder’s means, defined by H p {a, c) 


aP + C P 

2 


i Ip 

,P* 0: 






Lehmer’s means, defined by L p {a, c) - 
Stolarsky’s means, defined by S p (a, c) 


aP + cP 


ciP - 1 + cP~ x ’ 

\ a P - C Pl} /{p ~ l) 

■■ ,P* 0,1. 

pa — pc 


And it is easy to see that A = H\ — L\ — Si, G — lim p ^o H p = L \/ 2 — S_ 1 
and H — H - 1 = Lq. 


13.2 Geometric Harmony 

The Pythagoreans held that ‘All things consist of number’ , that is, positive inte- 
gers or their ratios — and preferably small integers too. Integers were endowed 
with qualitative attributes such as gender that today belong to the world of 
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Table 13.1. Pythagorean solids. 



Faces / 

Vertices v 

Edges e 

Cube 

6 

8 

12 

Tetrahedron 

4 

4 

6 

Octahedron 

8 

6 

12 

Icosahedron 

20 

12 

30 

Dodecahedron 

12 

20 

30 


Table 13.2. 

Harmonic polyhedra. 

/ 

e 

V 

6 

12 

8 

30 

70 

42 

170 

408 

240 


the numerologist rather than the mathematician, yet some concepts have car- 
ried over the millennia, for example figurate numbers (square, triangular, cubic, 
pyramidal, etc.). His mysticism had the five ‘Pythagorean solids’ known to him, 
the cube, tetrahedron, octahedron, icosahedron and dodecahedron, coupled with 
earth, fire, air, water and aether (see Table 13.1). The Pythagorean, Philolaus, 
is said to have called the cube ‘a geometric harmony’ because the numbers 6, 8 
and 12 are in harmonic progression, with 8 the harmonic mean of 6 and 12 (but 
then so is the octahedron if the order of the numbers does not matter), which 
leads to a nice question (asked and answered by John Webb) about which other 
polyhedra are harmonic in the sense of Philolaus. 

Yet another of Euler’s results helps to provide the answer; the fundamentally 
important topological fact that for any convex polyhedron the number of ver- 
tices, faces and edges are related by v + f — 2 + e. Add to this the condition 
of ‘geometric harmony’ that 


l/e+l/f 

and some of Webb’s judicious algebra and we have that 
(e — f — l) 2 — 2(/ — l) 2 = — 1, 


which is Pell’s equation. The continued-fraction approximations of ~J2 then 
yield a list of the infinite number of possibilities for ‘harmonic polyhedra’, 
which starts with the values given in Table 13.2. The first one is called a cube. . . 
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Table 13.3. Pythagorean musical scale. 


Note 

do re mi 

fa 

SO 

la 

ti 

do 

Ratio 

1:1 9:8 81:64 

4:3 

3:2 

27:16 

243:128 

2:1 


13.3 Musical Harmony 

The sequence of 6, 8 and 12 appear again in the Pythagorean world. If a string 
of length 12 units emits a middle C when plucked, the same string shortened 
to 8 units will emit G, a perfect fifth above C, and if the string is shortened 
to 6 units it will emit C an octave up. The young Pythagoras is thought to 
have noticed such behaviour through hearing the variously concordant and 
discordant sounds of blacksmiths’ hammers sounding together and, through his 
subsequent investigations with stretched strings, is credited with the discovery 
that musical intervals which are recognized as concordant are related by small 
integer ratios. More generally, a half length gives a frequency ratio of 2:1, the 
musical octave; a third length gives a ratio of 3:2, the musical fifth; a quarter 
length gives a frequency ratio 4:3, the musical fourth; a fifth length gives a 
frequency ratio 5:4, the major third. That the arithmetic sequence 1, 2, 3, 4, 5 is 
involved only strengthened the belief in the sacred nature of number. It is easy 
to see that the reciprocals of any sequence of numbers in arithmetic progression 
are themselves in harmonic progression; the Pythagoreans knew this too, and 
so we arrive at the modern definition of a harmonic sequence. In the attributed 
words of the Pythagorean, Iamblichus: 

the harmonic mean was then called subcontrary, but which was 
renamed harmonic by the circle of Archytas and Hippasus, because 
it seemed to furnish harmonius and tuneful ratios. 

As we discuss the Pythagorean contribution to musical theory we should men- 
tion the musical scale that bears his name, which was based on the view that 
the fifth is a particularly pleasing ratio and that the scale should be constructed 
from it and the 2:1 octave. So, taking ‘fifths of fifths’ and scaling down by 2 
as much as necessary to bring it within the octave brings about Table 13.3 and 
the Greek musical scale of the Pythagorean school. The process will never fill 
the octave (there are plenty of numbers missed, not least the embarrassingly 
irrational ~J2) and it will never reach an octave exactly since no power of | is a 
power of 2. To be so would mean that ( |)” = 2"’ or 3" = 2'" + ", which brings 
in logarithms with 


m + n 
n 


In 3 
ln2 


0.405 465..., 
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Table 13.4. Gradus Suavitatis. 


Gradus 

Ratio suavitatis 


1/2 (octave) 2 

3/2 (fifth) 4 

4/3 (fourth) 5 

5/4 (major third). 5/3 (major sixth) 7 

6/5 (minor third), 9/8 (major whole tone), 8/5 (minor sixth) 8 

10/9 (minor whole tone), 9/5 (minor seventh), 15/8 (major seventh) 10 
16/15 (diatonic semitone) 11 

81/64 (pythagorean major third), 45/32 (tritone) 14 


and continued fractions with 0.405 465. . . = [1, 1, 1, 2, 2, 3, 1, 5, . . . ] giving 
best approximations of 

m + n _ 1 2 3 8 19 65 
n ~ T’ T’ 2’ 5’ 12’ 41’ "" 

Therefore, it is arithmetically optimal to build a scale based on the fifth if we 
use 1, 2, 5, 12, 41, ... of them to the octave (quite what they would sound 
like is another matter), which means that the Pythagorean scale is not in this 
sense optimal. Another way to look at the inexactitude is that in the scale the 
interval between successive notes is either a ‘tone’ of 9:8 or a ‘minor semitone’ 
of 256:243; unfortunately, the semitone is not quite half a tone since two of 
them give a frequency ratio of (256:243) 2 ^ 9:8. Factorize into primes and the 
error lies in the approximation that 2 19 ~ 3 12 or (j) 12 ~ 2 7 and so going up 12 
fifths and then down 7 octaves brings you back to where you started — nearly. 
The difference is known as the ‘Pythagorean comma’, which will be our full 
stop! 

Harmony was close to Euler’s heart too. During the Middle Ages the quad- 
rivium comprised the four mathematical ‘arts’ — arithmetic, music, geometry, 
astronomy — and constituted the higher part of knowledge, as opposed to the 
trivium , the elementary part, which comprised grammar, rhetoric and dialectic. 
Euler lived later, but it was understandable that a man such as he would take an 
interest in music, particularly as parts of his long life coincided with the lives of 
Bach, Handel, Haydn and Mozart. In 1731, when he was 24 years old, he wrote 
An attempt at a new theory of music, exposed in all clearness according to the 
most well-founded principles of harmony (although it was not published until 
1739) and returned time and again to musical theory, refining and developing 
his thoughts. We will make no great attempt to pursue him here (this study alone 
occupied 263 pages) but simply mention his use of primes in trying to quantify 
the melodiousness, the ‘degree of sweetness’ — or as he called it the ‘gradus 
suavitatis’ — of sounds. The gradus suavitatis of a single note was taken to be 
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1 and beyond that, if the frequency ratio of two notes is m:n and if the least 
common multiple of m and n is L, he made the definition 

G(m, n) — 1 + ]""[ (p - 1), 

p prime 
p divides L 

with multiplicities taken into account. For example, G( 4, 3) = 1 + (2 — 1) x 
(2 — 1) x (3 — 1) = 3 and, more fully, we have Table 13.4. 

Now that we have the name of the series properly established, we will look 
at some of those places in which it appears. 

13.4 Setting Records 

A record is ‘the best there is so far’. There are any number of examples of 
sequences of numbers naturally appearing wherein there is ‘improvement’ at 
some stage and beyond that another and H n naturally appears in the analysis of 
them. For example, consider rainfall figures and assume that the rainfall in one 
year does not affect that in any subsequent year; that is, annual rainfall figures 
are independent random variables. The first year of recording is a record by 
definition. In the second year, the rainfall level could equally likely be less or 
more than the first year, so the expected number or record years in the first two 
years is 1 + j. Continue this reasoning for a third year and we have two of the 
six possible orderings of the rainfall for the three years having the third year 
as a record and so the expected number of record years is 1 + ^ + j years. 
Continue this reasoning for n years, and we have that the expected number of 
record years is 

1 1 1 

1+- + H H — — H , , . 

2 3 n 

Two arbitrarily chosen examples are revealing. The Radcliffe Meteorological 
Station in Oxford has data for rainfall in Oxford between 1767 and 2000 and 
there are five record years; this is a span of 234 recorded years and 7/234 = 6.03. 
For Central Park, New York City, between 1835 and 1994 there are six record 
years over the 160-year period and //i6o = 5.65, providing good evidence that 
English weather is that bit more unpredictable! An interesting implication of 
the surprisingly small values of H n (for example //iooo and H\ oooooo are 7.49 
and 14.39, respectively) is that, without climatic change, record years would be 
very rare even over these large time spans. 

The accuracy of the predictions, based on the assumption of statistical inde- 
pendence between readings, can be turned around to itself be a measure of that 
independence. In particular, and to quote Ned Glick, 

... at a 1954 meeting of the Royal Statistical Society, F. G. Foster 
and A. Stuart pointed out that record low and record high annual 
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rainfalls in Oxford were much more rare than record breaking per- 
formances (low times or high distances) in annual track and field 
competitions of the British Amateur Athletic Association. This con- 
trast is not surprising: athletic recruiting and training have inten- 
sified over the past century; but no one has done much about the 
weather. Although athletic performances do fluctuate, there is an 
average trend over decades for national competitors (and there- 
fore winners) to run faster, jump higher or throw further; while 
weather conditions over a century are more intuitively random, 
without dramatic linear trend. Of course, it is possible for 100 ran- 
dom observations to be ordered so that the sequence has as many 
as 10, 50 or 100 record highs. But detailed calculation shows that 
the probability of 10 or more record highs in a 100 -long random 
sequence is less than 5%. Therefore, in a situation where data are 
less familiar than rainfalls or race times, the mere finding of many 
record highs or lows suggests that the data are not a simple random 
sample; that is, an alternative hypothesis should be sought to fit 
the data better. Foster and Stuart gave formal procedures using the 
sum or the difference of record high and record low frequencies to 
fit or to test the hypothesis of randomness. Other statisticians have 
also considered such inferential procedures. 


13.5 Testing to Destruction 

Suppose that we have n wooden beams that are to be used as horizontal supports 
in building projects. Naturally, we would want to know how strong they are, with 
the minimum breaking strain the crucial factor. To test this breaking strain we 
can imagine placing a beam on two supports, one at either end, and applying a 
gradually increasing force at its centre; when the beam breaks we will record its 
breaking strain. Applying this technique will assuredly give us the information 
we want but at the cost of the destruction of all of the beams. We will know 
what was, rather than what is, true. A less expensive and more useful approach 
would be this: let the breaking strain of the rth beam be B r for 1 ^ r ^ n, then 
we adopt the following procedure. 

• Test the first beam to destruction, so that we know B\ . 

• Test the second beam by gradually increasing the force to B\ but no 
further. If it survives, we will know that Bt > B\ \ if it breaks, we record 
its breaking strain, Bi. 

• Test the third beam by gradually increasing the force to min{ B \ , /A}- If 
it breaks, record B 3 , otherwise move on to the next beam. 
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Using the same reasoning as with record years, given that the strength of the 
beams is an independent variable, the expected number of beams broken is 

1 1 1 

H„ = 1 + - + -H H — ■ 

2 3 n 

So, rather than breaking all of the 1000 beams, we would expect to break about 
H iooo ~ 7.5 of them to establish the minimum breaking strain, no doubt to the 
delight of the building company. It can also be shown that the variance of the 
number of beams broken is H n — tt 2 /6, with another appearance of jt 2 /6. 

13.6 Crossing the Desert 

We will look at the problem in its form as a puzzle of World War II and solved 
by N. J. Fine in 1947, although it dates back further. 

You have to cross the desert by jeep. There are no sources of fuel in the desert, 
and you cannot carry enough fuel in a jeep in order to make the crossing in one 
go. You do not have time to establish fuel dumps, but you do have a large supply 
of jeeps and drivers, none of which you want to lose. How can you get across 
the desert, using the minimum amount of fuel? 

We will measure the distance a jeep can travel in terms of a tankful of fuel; one 
jeep by itself can travel a distance of one tankful. If two jeeps set out together, 
they should travel for 1 of a tankful, then Jeep 2 transfers J, of its tankful to 
Jeep 1, and returns to base on the remaining 7 tankful. Jeep 1 is then able to 
travel a total of 1 + j tankfuls. 

With three jeeps, they should stop after travelling i of a tankful, then transfer 
^ of a tankful from Jeep 3 into each of Jeeps 1 and 2, which are now full. 
Jeep 3 now has | of a tankful. Jeeps 1 and 2 now proceed as before, with Jeep 2 
returning with an empty tank to Jeep 3. Between them, they have enough fuel to 
get back to base. Meanwhile, Jeep 1 has travelled a total of 1 + j + -j tankfuls. 

The same reasoning shows that with four jeeps you can achieve a distance 
ofl + ^ + j+ ^ tankfuls, and with n jeeps you can get a jeep across a desert 
that is 



tankfuls wide. The divergence of the series means that with this system of 
transferring fuel, we can effect the crossing of a desert of arbitrarily large 
size — as long as there are enough jeeps and drivers. 

13.7 Shuffling Cards 

A ‘top in at random’ shuffle is one in which the top card of a card deck of n 
cards is removed and inserted at random in the deck. How many times must 
this shuffle be repeated before we can regard the deck as ‘random’ ? 
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We follow the progress of the card which is initially at the bottom of the deck. 
This card (label it B) stays at the bottom until another card is inserted below 
it. Since there are n places into which a card taken from the top can go, the 
chance that it will go below B is \/n, and therefore on average it will take n 
‘top in at random’ shuffles before a card is placed below B. With this done, the 
chance that a card taken from the top and inserted at random into the deck will 
go in below B is 2/n since there are now two places below B, and the expected 
number of shuffles needed to get a second card below B is n /2 and the expected 
number of shuffles needed to get two cards below B is n + nil. Note that at 
this stage the cards below B are in random order. Continuing in this way, we 
see that the expected number of ‘top in at random’ shuffles needed to get B up 
to the top of the deck is 


n n n 


2 3 4 





At this stage the cards below B are in random order, and just one more shuffle, 
which puts B at random into the deck, is needed to randomize the deck. The 
total number of shuffles needed is therefore 


n n n 


2 3 4 


n 



+ 1 



1 + 



1 

3 


1 

4 


+ ••• 


= nH n . 


-V 1 ) 

n — 1 n / 


For an ordinary bridge deck of 52 cards this makes it about 230 ‘shuffles’. 


13.8 Quicksort 

Of the many different algorithms that have been devised to sort an array of data, 
Quicksort (devised by C. A. R. Hoare) is favoured more than most because the 
time that it takes to perform a sort is usually comparatively short. 

The general idea of Quicksort is that an item of the array, called the pivot 
point, is selected and the array divided into two, with all items with a value less 
than the pivot point moved to or remain on its left and all items with a value 
more than the pivot point are moved to or remain on its right. The process is then 
continually repeated in each sub-array until the data are sorted, which occurs 
when the length of each sub-array is 1 ; no effort is made to arrange the data in 
each sub-array. To look at the mathematics involved, write T n for the average 
time for the algorithm to sort a list of n items arranged in some unknown order. 
Suppose that the r th element of the list is chosen as the initial pivot point (which 
we assume takes 1 unit of comparison time), then we need n — 1 comparisons 
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to divide the data into the two partitions plus the 1 and so 

T n — n + T r -\ + T n - r , r — 1,2 ... ,n with To — 0. 
We can eliminate r by summing over it to give 

n n n 

Tn — n + (TV— l + T n _ r ), 

r— 1 r= 1 r— 1 

n n n— 1 

n T n — m 2 + y ' T r - 1 + y ' T„_ r = rC + 2 y^ T r , 

r=l r=l r= 0 


2 x 

T n = n + -) T r . 

n * J 


This makes 


2^ ) 2^1 

iT n - (n - 1)7V— l = n n + - > T r } - (n - 1) n - H > T r [ 

« 1 l « - 1 VV ] 


/7 — 1 n — 2 


(« - l) 2 + 2^ T, - 2^ T r = In - 1 + 2r„_i 


r=0 r=0 


and so we get 


nT n — 2n — 1 + 2T'„_i + (/? — 1 ) TV — l = (n + 1 ) 7",, _ i +2 n — 1, n — 1,2,... 

with To = 0. 

A magical leap avoids the world of recurrence relations and we state the 
solution 

n + 1 i 

T n — 2 (n +1) 3 n — 2, for n Js 1, 

z — ' r 

r= 1 

which we can check. If it is, then 

(n + 1 ) 7"„ _ i + 2n — 1 = (n + 1) x |2 m 3(m — 1) — 2 J + 2n — 1 

' r=l ' 

" 1 

= 2 n(n + 1) — | - (n + 1 ) ( — 3 n + 1) + 2n — 1 

r— I 

" 1 

= 2m (n +1) Y 3 m 2 — 2n + 1 + 2 n — 1 

z — ' r 

r= 1 
n+1 j 

= 2n(n + 1) V 2m — 3m 2 = n T n . 
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Figure 13.1. 


And we are done, since we also have 

1 

T 0 = 2(0 + 1) > - - 3 x 0 - 2 = 0. 

' r 

r= 1 

Using this we can give a measure of the efficiency of Quicksort by estimating 
H n by the natural logarithm and replacing n by n + 1 for large n : 


Tn 


n + 1 

2(n + l)J2 

r=l 


1 

- - 3n - 2 


0(n ln«). 


2(n+ 1) 


^ln(n + 1) + y - 


3n + 2 \ 
2 (n + 1)) 


This compares pretty favourably with alternatives like the simple Bubble sort, 
where the same average is about 0{n 2 )\ of course, worst-case scenarios can 
happen, which force the n In n towards n 2 . 


13.9 Collecting a Complete Set 

There are many occasions when, as a marketing ploy, sets of objects are dis- 
tributed among products to encourage sales, particularly among children. We 
will model the situation with packets of breakfast cereal and suppose that there 
are n distinct toys distributed randomly (which is a big assumption), one to 
each box, and among an unlimited number of boxes. The question is: what is 
the expected number of boxes that must be bought for the child to collect the 
whole set of toys? 

First, we need a preliminary result. The infinite geometric series, 

l+x + x 2 + x 3 + -- - = for |x | < 1 , 

1 — x 

which we have used several times before, can legitimately be differentiated with 
respect to x to give 

1 + 2x + 3x 2 + 4x 3 + • • • = — — y , 

(l -xY 

with the same range of convergence. Now to the problem at hand. 

Let E r be the expected number of boxes to be opened to collect the rth new 
toy. Pictorially, see Figure 13.1. Since the first box must yield a new toy, E\ = 1 . 
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Then 


E 2 = 1- 


n — 1 1 n — 1 

- + 2 


n 

n — 1 
n 


3 - 


1 \ ft — 1 


l + 2( - ) + 3( - +4 - 

nj 


n ) n 

i \ 2 


4 - 


1 Y n - 1 


Putting x = l/n in the above result then gives 

n — 1 1 

£ 2 = 


n (1 — l/n) 2 n — 1 


Continuing this argument, 


£3=1- 


n — 2 2 n — 2 

- + 2 


n 

n — 2 
n 

n — 2 


2\ 2 n - 2 


3 - 


n / n 


1+2 - +3 - +4 - 


1 


+ 4 - 


2 V n — 2 


+ • 


n (1 — 2/n) 2 n — 2 

And so the expected number of cereal boxes that must be bought to collect the 
whole set of toys is 


T n — E \ + £2 + £3 + • • • + £;, 



A non-random distribution would, of course, increase this number. The reader 
could model this by, for example, throwing a fair six-sided die until all six 
numbers have shown uppermost, in which case, n — 6 and 7(, = 14.7. Using 
an ordinary bridge deck of cards and cutting until all cards have shown would 
require n = 52 and a lot more patience: T52 ~ 205. 


13.10 A Putnam Prize Question 

The William Lowell Putnam Mathematical Competition is an annual contest for 
college students in America, established in 1938 in memory of its namesake. 
It awards cash prizes to both individuals and teams. Problem B5 of the 1992 
Putnam competition involved determinants and was the following. Is A „/n\ 
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bounded, where 
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We show the essential steps of an elegant solution of this problem and leave it 
to the interested reader to provide the details of the uses of determinants! 
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= n\H n . 


So, A n /n\ — II, , and since H n diverges, the answer is that A „/«! is unbounded. 
13.11 Maximum Possible Overhang 

If a stack of playing cards (for example) is placed on the edge of a table and 
made to overhang as in Figure 13.2, we can ask the question. What is the 
biggest overhang possible? Suppose that the cards are 2 units wide. Clearly, we 
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get maximum overhang from one card to the next if the upper card is displaced 
so that its centre of gravity is just above the edge of the one it is on top of. Let 
d r be the distance from the right edge of the top card to the same edge of the 
rth card from the top. Then d\ = 0 and if d r + 1 is to be the centre of gravity of 
the first r cards, 


(d i + 1) + + 1) + idj, + 1) + ■ • • + (d r + 1) 

d r + 1 = 


1 < r < n. 


Hence 


rdf + 1 — v T rf i T di T • • • T dr — t T dr , r ^5 0, 


and 


(r - 1 )d r = r- l + di+d 2 -\ h d r - 1, r > 1. 


Subtracting gives 


rd r + 1 — (r — l)r/,- = 1 + d r , r ^ 1. 

And, therefore, 

d r + 1 = d r + 1/r, r ^ 1, 

the second formula defining the harmonic series, and so d r +\ — H r , and setting 
r = n gives //„ as the total overhang, and again the divergence of H n means 
that theoretically the overhang can be as large as we please. 

13.12 Worm on a Band 

This intriguing problem seems to have been invented by Denys Wilquin in 
1972. A (mathematical) worm starts at the end of a (mathematical) rubber band 
of initial length 1 m. The worm crawls at a constant 1 cm min - 1 and at the 
end of each minute the band instantly stretches by 1 m. So, just after 1 min of 
crawling the worm is 1 cm from the start and 99 cm from the end, but the band 
then instantly stretches by 1 m with the worm stationary relative to it, and as it 
is 1% from the start and 99% from the end it is 2 cm from the start and 198 cm 
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from the end. Just at the end of the second minute the worm is 3 cm from the 
start and 197 cm from the end, that is, (1 + j)% = 1.5% from the start and 
98.5% from the end; the band stretches again and becomes 3 m long; the worm 
is, therefore, 4.5 cm from the start and 295.5 cm from the end. Just at the end 
of the third minute the worm is 5.5 cm from the start and 194.5 cm from the 
end, that is, (1 + \ + j) = l|% from the start and 98 g% from the end. And 
so the process continues. The question is, does the worm ever reach the end? 
The answer relies critically on the fact that when the rubber band stretches the 
percentage of it that the worm has crawled along remains constant; therefore, 
he crawls -^th of its length in the first minute, ^th in the second minute, 
j^th in the third, etc. So, after n minutes the fraction of the band that he has 
crawled is 

1 /I 11 1\ H n 

T00VT + 2 + 3 + '" + nJ _ T00' 

Again, using the logarithmic estimate for H n we have that H n = 100 when 
In 7i + y ^ 100, which is when, n Rs e l00 ~v minutes. Our tireless worm will 
need longer than the estimated life of the Universe to complete his journey. 

13.13 Optimal Choice 

This final, surprising appearance of the harmonic series is remarkable for its 
particularly counterintuitive nature and appears in many forms: picking a sec- 
retary, a suitor, a car, a restaurant, etc. The common ground is that there is a 
list from which a single choice has to be made, the list is randomly ordered 
and there is a single best choice — and we would like to make it. We could, 
of course, appraise each candidate and so guarantee success, or at the other 
end of the spectrum we could be lazy and simply pick one at random; if there 
are n choices in total, the chances of success in picking the best would then 
be 1 and 1/n, respectively. Is there an optimal strategy that fits somewhere in 
between, making us work a bit — but not too much? The answer is ‘yes’, and a 
very elegant ‘yes’ too. That strategy is to reject the first r candidates on the list 
and then choose the first candidate better than the best reject. 

Why is this sensible and when is it optimal? What value does r have? Suppose 
that the best candidate is B. then we will fail if B is among the first r candidates 
and since all subsequent candidates will be compared to It we will inevitably 
have to choose the lucky ;ith candidate, otherwise we have a chance of success — 
but what chance? The answer depends on where B is among the remaining 
choices and we need to deal with each possibility separately. 

If B happens to be in the ( r + l)th position, we will choose it for certain; this 
happens with probability l/n . Now suppose that B is in the (r + 2)th position, 
then if the occupant of the (r + l)th position is the best yet we will fail in our 
goal by choosing it, otherwise we will choose B nonetheless. This means that 
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we will choose B if the best yet among the first r + 1 choices lies among the 
first r of them; this occurs with probability r/(r + 1). We do need B to be in 
the ( r + 2)th position, and this happens with that same probability of l/«, so 
the total probability of success in this case is 

1 r 

— x . 

n r + 1 

Now the process continues, supposing that B is in the (r + 3)th, ( r + 4)th, . . . , 
nth position, giving the probabilities of success as 

1 r 1 r 1 r 

— X , - X , . . . , - X . 

n r + 2 n r + 3 n n — 1 

The total probability of success (that is, of choosing B ) using this strategy is 
then 


Pin, r) 


1 / r r r 

n \ ~^r+l^r + 2~*”r + 3~^ 



For any given n it is this probability that we wish to maximize as r varies from 
0 to n — 1 and the harmonic series is evidently making another appearance. In 
terms of it we have 

Pin, r) = — {1 + r{H n _\ — H r )}. 
n 

This is easily computed for small values of n, for example, n — 5, 10, 100, 
1000, and the behaviour is shown in Figure 13.3. 

The points have been joined to emphasize the behaviour. We can clearly see a 
trend appearing, with the maximum value of Pin, r) decreasing from just over 
0.4 to something under it and achieved at a value of r slightly more than a third 
of n. Table 13.5 gives the maximum probabilities and the values of r at which 
they are achieved for the first few and several larger values of n . 

From this we can see that the strategy results in a probability of success 
(that is, of choosing B) of at least 37% no matter how large n is; it may not 
be certainty but it is a great deal better that the diminishing l/« of the random 
guess. 

The full analysis of the problem again has us approximating the H n by the 
natural logarithm for large n (and therefore large enough r) 


P(n, r) -{1 + r([ln(« - 1) + y] - [lnr + y])} = - 
n n 


1 + r In 


(n - 1) 


If we treat r as a continuous variable, we can use calculus to find the approximate 
coordinates of the maximum that we have seen in the plots, 


d P(n, r) 1 n — 1 1 1 

= - In r x - > 

dr n r r J 


1 

n 


In 


n — 1 


r 


- 1 
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n = 100 



n = 1000 
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O' 200 400 600 800 1000 

Figure 13.3. 

Continuous form of P(n, r). 

Table 13.5. 

Optimal choice table of values. 

n 

Opt. r 

Max. P (n , r ) 

1 

0 

1.000 

2 

0/1 

0.500 

3 

1 

0.500 

4 

1 

0.458 

5 

2 

0.433 

6 

2 

0.428 

7 

2 

0.414 

8 

3 

0.410 

9 

3 

0.406 

10 

3 

0.399 

20 

7 

0.384 

50 

18 

0.374 

100 

37 

0.371043 

200 

73 

0.369461 

300 

110 

0.369 352 

400 

147 

0.368 671 

500 

184 

0.368512 

1000 

368 

0.368 195 

5 000 

1839 

0.367 942 

10000 

3678 

0.367911 
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Table 13.6. Another optimal choice table of values. 


n 

Opt. r 

Max. P(n, r) 

2 

1 

0.500 

3 

1 

0.500 

8 

3 

0.409 8 

11 

4 

0.3984 

19 

7 

0.385 0 

87 

32 

0.3715 

106 

39 

0.3709 

193 

71 

0.369 5 

1264 

465 

0.368 13 

1457 

536 

0.368 10 


which means that for any stationary points, ln(n — l)/r = I and so (n—\)/r — e 
and there isn’t much lost in saying n/r — e. So, the optimal r is about n j e and 
gives the maximum P(n, r) approximately as 

(n — 1)1 r n 1 

1 + r In 1 sa - In - « - 0.37, 

r j n r e 

the base of Napier’s logarithms. 

A little intrigue remains, though. The continued-fraction representation of 
1 /e is just that of e shifted one place and so it is 

1/e = [0; 2, 1,2, 1, 1,4, 1, 1,6, 1, 1, 8, 1, 1, 10, 1, 1, 12, ...]. 

i j '} a n ri on n i 

This means that the first few convergents are 4, j, §, fj, pj, pyj, 

465 536 

1264 ’ 1457 ’ 

Table 13.5 lists some values of n and the corresponding optimal r, together 
with the value of P(n, r). Another selection of values of n yields the equivalent 
Table 13.6. 

The selection of the n is hardly arbitrary: they are of course the denomina- 
tors of the convergents of 1/e, and the optimum r are nothing other than the 
corresponding numerators. A bigger test is n = 14 665 106 — the denominator 
of the 20th convergent of 1 /e; the numerator is 5 394 991 — and guess what the 
optimum r is? Correct. It is reasonable, but why is it true? 

A peculiar feature of the procedure is that every candidate can be told the 
outcome of the interview immediately at its end — if they actually get an inter- 
view, that is! If we wish to sacrifice this feature, we can look at things slightly 
differently. Suppose that we replace the verb ‘reject’ with the alternative verb 
‘reserve’, then, if the best candidate is within the first r interviewed, we will 
inevitably continue to interview to the last candidate, but having done that we 
would choose that best candidate anyway from our initial reserves. Of course, 


1 

n 
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in a sense the procedure has failed because we will have enjoyed no saving of 
effort but we can ask, for example, what the value of r should be to ensure that 
the odds of success are just in our favour. Now, we have 

P(n,r) = —{1 + r(H n -i — H r )} + — , 
n n 

with the additional term the probability that the best candidate is within the 
first r interviewed. Even with the logarithmic approximation, finding r so that 
P(n,r) = \ is not capable of analysis but we can see what is happening with, 
for example, n — 1000. 

The function is maximizing to 1 at r = 1000 (unsurprisingly), but we are 
interested in where it equals | and a bit of computation reveals that this is 
achieved when r = 186 and continuing to higher n indicates that we have an 
asymptotic form r/n ~ 0. 1 86 682 2 . . . — whatever that is. In summary, if we 
apply the procedure, having automatically interviewed a bit under 20% of the 
candidates we have an even chance of picking the best of them! 
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It’s a Logarithmic World 


How can it be that mathematics , being after all a product of human thought 
independent of experience, is so admirably adapted to the objects of reality? 

Albert Einstein (1879-1955) 

As we have mentioned in the Introduction, the reader of this book will need little 
convincing that logarithms appear with great frequency in mathematics and its 
applications, particularly with so many differential equations involving them or 
exponentials in their solution. Power laws abound in nature: Kepler’s third law, 
the universal law of gravitation, Boyle’s Law, etc. A browse through any science 
book will yield any number of examples, and where there is a power law, there 
is a linearizing logarithm, as Kepler may have experienced. The intensity of 
earthquakes is measured on the logarithmic Richter scale, fractal dimension is 
defined in terms of logarithms, distance in the Poincare model of hyperbolic 
geometry is logarithmic, and so the list continues. The final two chapters are 
devoted to one particular and major use of them as a measure of the number 
of primes below any number. Here we will look at three other examples of 
them forcing themselves into the solution of a problem. They can hardly be 
representative, but each has a novel appeal and each has been developed into 
important ideas. 

14.1 A Measure of Uncertainty 

A dictionary definition of ‘entropy’ is ‘a measure of the disorder of a system’. 
The word is famously associated with the Second Law of Thermodynamics, 
but in 1948 it found use in the hands of the American scientific genius Claude 
Shannon (1916-2001), the ‘father of the information age’, on whose theories 
rest the ideas of modern digital communication. A delicious eccentric, his house 
was home to five pianos and 30 other instruments, chess-playing machines 
(including one that moved the pieces with a three-fingered arm, beeped and 
made wry comments), rocket-powered Frisbees, motorized Pogo sticks, a mind- 
reading machine, a mechanical mouse that could navigate a maze and a device 
that could solve Rubik’s Cube. His love of juggling led to the invention of a 
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machine with soft beanbag hands that juggled steel balls and also a tiny stage on 
which three clowns juggle 1 1 rings, 7 balls, and 5 clubs, all driven by an invisible 
mechanism of clockwork and rods. His love of the unicycle led to him using 
one as transport along the corridors of the Bell Laboratories, where he worked 
for many years. His love of both led to the design of a unicycle with asymmetric 
gearing so that he could more easily juggle. . . as he unicycled along. 

As an employee at the Bell Telephone Company, he was naturally interested 
in problems that arose from communication in all its forms, an interest which led 
to his influential 1948 paper, later to appear in book form as The Mathematical 
Theory of Communication, co-authored with the mathematician Warren Weaver. 
The ideas were embraced, made rigorous and expanded by Alexandre Khinchin 
(whose work on continued fractions we will touch on later in this chapter) in 
two important papers, to appear in English in 1959 as the book Mathematical 
Foundations of Information Theory. From there, the subject has blossomed 
into a critically important area of modern applied mathematics. It is from the 
Shannon-Weaver book that our first example is culled, as they quantify the 
concept of the disorder in a communication system, phrasing the idea in terms 
of probabilities. We will not move further to see him develop the initiative 
into a series of seminal results, crucial to modern communication systems, 
although the reader may well wish to consult either book to take the study 
further; both are currently available. How can uncertainty be measured and 
how do logarithms naturally appear as a measure? We let Claude Shannon tell 
us (also see Figure 14.1): 

6 Choice, uncertainty and entropy 

We have represented a discrete information source as a Markoff 
process. Can we define a quantity which will measure, in some 
sense, how much information is ‘produced’ by such a process, or 
better, at what rate information is produced? 

Suppose that we have a set of possible events whose probabilities 
of occurrence are pi, P2, ■ • • , p n - These probabilities are known 
but that is all we know concerning which event will occur. Can we 
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f(P. 
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P 

Figure 14.2. A reasonable 2-state entropy graph. 


find a measure of how much ‘choice’ is involved in the selection 
of the event or of how uncertain we are of the outcome? 

If there is such a measure, say H (pi, p 2 , . . . , p n ), it is reasonable 
to require of it the following properties: 

1 . H should be continuous in the p, . 

2. If all the p, are equal, p, = 1/n, then H should be a mono- 
tonically increasing function of n. With equally likely events 
there is more choice, or uncertainty, when there are more 
possible events. 

3. If a choice be broken down into two successive choices, the 

original H should be the weighted sum of the individual val- 
ues of H . The meaning of this is illustrated in Fig. 6. At the 
left we have three possibilities p\= p 2 = j, P 3 = g. On 

the right we first choose between two possibilities, each with 
probability j, and if the second occurs make another choice 
with probabilities =, j. The final results have the same prob- 
abilities as before. We require, in this special case, that 



half the time. 

Theorem 2. The only H satisfying the three above assumptions is 
of the form: 


n 



where K is a positive constant. 
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j- branches 



Each of s m 1 branches 


Figure 14.3. The tree diagram for equally likely choices. 


We need not concern ourselves about the meaning of the most famous legacy 
of Chebychev’s student, Andrei Markov (1856-1922). Passing the reference 
by, the first two conditions are intuitively reasonable, but the third demands 
more care. Firstly, to develop a feeling for what is happening, suppose there 
is only one event possible, then there is no uncertainty and we would want 
H(p i ) = H ( 1 ) to be 0. Now suppose that there are two possibilities, then we can 
write H(p \ , P 2 ) = H( p, 1 — p). If p is close to 0 or 1, there is little uncertainty 
and we intuitively feel that in these cases H should be near 0, with the maximum 
uncertainty achieved when p = ^ . We would then reasonably expect a graph 
of f(p) = H(p , 1 — p) to look something like that in Figure 14.2. The third 
condition carries with it the usual meaning of tree diagrams and is best looked 
at in two stages. If all n choices are equally likely, write (as Shannon did) 



Now suppose that n = s m for some positive integers 5 and m, then the choice 
can be made in two stages, as in Figure 14.3. 

Which makes 


A{s m ) = A(j) + 1 AOs" 1-1 ) x 5 = A(s) + ACs"' -1 ), 


s 


using the fact that we can make the remaining s m ~ 1 choices in .v equally likely 
ways, each with a probability of \/s. Repeating the process results in A(s m ) = 
mA(s ) + A(l) and since A(l) = H( 1) = 0, we have that A(s m ) = mA(s) and 
we can begin to discern properties of logarithms appearing. 

We will continue to follow Shannon’s reasoning as he develops the full log- 
arithmic behaviour of this equally likely form of uncertainty and from that 
establishes the result for its most general form. 

For an arbitrary large chosen positive integer n, choose a positive integer m 
and positive integers s and t so that s m ^ t n ^ s m+1 , which makes (to any 


base) 


log s m ^ log t n <logs m+1 , 


m log s ^ n log t ^ (m + 1) log s, 
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m log s 

< 

n log t 

< 

(m 

+ l)log 

n log s 


n log s 

s: 


n log s 

m 


logr 

m 


1 

— 

s? 

— — — < 

. — 

■ + 

— ? 

n 


logs 

n 


n 

0 


logr 

m 


1 

< 

logs 

n 

< 

n ’ 


which we will write as 

log t m 1 

< - 

log s n n 

He then establishes a similar inequality for the function A. 

We have that A(s m ) ^ A(r") ^ A (s m+ 1 ) and since A(s m ) = mA(s) and 
A(r") — « A(r), we have 


;«A(s) ^ nA{t) ^ (m + l)A(s), 
mA(s) nA(t) (m + l)A(s) 
nA(s) ' nA(s ) ^ «A(s ) 
m A(r) m | 1 
n 

and, as before, 


Combining these gives 

| A(s) logs | " n 

and since n can be taken to be arbitrarily large, this makes 

Mt) log t A(t ) A(s) 

= and = 

A(,y) log 5 logr logs 

for all such s and r, which must mean that A(t) — K\ log r, for some constant 
K\. 

The move to the general expression for H is made in the following way. 
Suppose that we have n different choices c , , with each choice occurring n r 
times 1 ^ r ^ n. This means that 


. . . ^ 1 — • 

AH) n n 


A(r) 

m 

A(s) 

n 

m 

logr 


n 

n — n r and 

r= 1 



We can think of the available choice in two ways. 

The possibilities can be listed as shown in Figure 14.4 and the choice be made 
by considering them to be n possibilities, all equally likely, to give A in). Or 
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J I L 


J I L 




Figure 14.4. 



Figure 14.5. 


we can group the identical ones together, see which group is chosen and then 
see which member of that group is chosen, which is represented in Figure 14.5. 
Now our measure of choice is H(p i, pj, . . . , p n ) + YTr=\ PrMn r ), where the 
first part relates to the uncertainty of which box is chosen and the second part 
the uncertainty of which of the equally likely choices are made within a box. 

Equating the two forms gives 

n 

A(n) = H(pi,p 2 ,...,p n ) + E p r A(n r ), 

r= 1 


which makes 

n 

K\ log n = H(p 1 , p 2 , . . . , Pn) + K['Y^ J Pr log n r 

r=l 


SO 


//(/?!, p 2 p„) 


n 

K i log n - K\ ^2p r \ogn r 

r= 1 

n n 

K 1 logn ^2 p r - K\ y~^ p r log n r 

r= 1 r= 1 

n n 

K\ Y2 Pr l°g« - ^1 E Pr \°8 n r 

r= 1 r= 1 


n n 

k \ E pr log — = K i E Pr log Pr - 

r = 1 r=l 


Note that K [ must be chosen to be negative, since the logarithms are all negative 
and H must be increasing as a function of «, so write K = —K\, with K > 0, 
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to give 

n 

Hip 1 , P2, Pn) = -K ^2 Pr log p r . 

r= 1 

The choice of a value for K is arbitrary, as is the base of the logarithms, and 
Shannon’s concept of entropy (he points out) . will be recognized as that of 
entropy as defined in certain formulations of statistical mechanics. . . 

If we perform a small check with the case n — 2, choosing K = 1 and natural 
logarithms, we have that fip) — Hi p, 1 — p) = — p In p — (1 — p) ln(l — p), 
which looks like the graph in Figure 14.6, and which is what we would have 
hoped for. Uncertainty is logarithmic — and very important. 

14.2 Benford’s Law 

Logarithm tables have helped to solve countless problems since Napier’s inven- 
tion of them, and they have created one too, a particularly strange phenomenon 
that at first sight seems barely plausible, but to which they themselves are the 
solution. Suppose that an English-speaking student is learning the French lan- 
guage and has a combined English/French and French/English dictionary, split 
into two halves. It is very likely that the English/French half of the book will 
be more used than the other and we would expect as time goes by for the book 
to show uneven signs of wear; there is no surprise here. A book of logarithms is 
different. If, over time, it is used for a variety of calculations, we would expect 
its use to be evenly distributed throughout its pages: it isn’t. 

The distinguished American mathematician and astronomer Simon New- 
comb (1835-1909) was made a Foreign Member of The Royal Society on 13 
December 1877, exactly the same date as Chebychev was so honoured. We have 
mentioned Chebychev before and more of his mathematics will be discussed in 
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Table 14.1. Distribution of first significant digits. 


d 

Intuitive 

probability 

Suggested 

probability 

1 

0 . 111 ... 

0.30103 

2 

0 . 111 ... 

0.17609 

3 

0 . 111 ... 

0.12494 

4 

0 . 111 ... 

0.09691 

5 

0 . 111 ... 

0.079 18 

6 

0 . 111 ... 

0.06695 

7 

0 . 111 ... 

0.057 99 

8 

0 . 111 ... 

0.051 15 

9 

0 . 111 ... 

0.045 78 



d 

Figure 14.7. 

Chapter 15, but Newcomb’s offbeat observation has its place here. He noticed 
that the books of logarithms that he shared with other scientists showed greater 
signs of use at their beginning than they did at their end. Since log tables are 
arranged in ascending numeric order, this suggested that more numbers with 
small rather than large first significant digits were being used for calculation. 
Yet, all sorts of numbers of all sorts of sizes were being dealt with; why didn’t 
the distribution of their most significant digits even out? Newcomb’s investiga- 
tions led him to the empirical law that the fraction of numbers that start with the 
digit d is not the intuitively reasonable g but the remarkable log 10 (l + 1 /d). In 
1 88 1 he mentioned the phenomenon in a brief article in the American Journal of 
Mathematics but, without the mathematical justification to support it, it was no 
more than a curiosity and disappeared from the mathematical landscape — until 
1938, when Frank Benford, a physicist at G.E. noticed precisely the same thing. 


146 


IT’S A LOGARITHMIC WORLD 


The distribution of first significant digits did not appear to be a uniform ^ but 
that remarkable log 10 (l + i/d), summarized in Table 14.1 and Figure 14.7, 
which compare the intuitive and suggested probabilities of the first significant 
digit appearing. 

If the suggested probabilities are the true measure of the frequency of occur- 
rence of naturally occurring numbers, it is small wonder that at some time 
someone would notice that the front of a book of log tables is about six times 
more dirty than the back. 

To add weight to the hypothesis, he compiled a table of 20 229 numbers, 
including such wildly disparate categories as the areas of rivers, death rates, 
baseball statistics, numbers in magazine articles and the street addresses of 
the first 342 people listed in the book ‘American Men of Science’. The table 
is reproduced in Table 14.2 and is largely consistent with the idea that these 
seemingly unrelated sets of numbers follow the same first-digit probability 
pattern as the worn pages of the logarithm tables. 

The assertion that distribution of first significant digits is log 10 (l + i/d) has 
subsequently become known as Benford’s Law. But where is the mathematics 
to support it? The counterintuitive nature of the law is a phenomenon seen 
elsewhere in probability theory, perhaps most common is the ‘birthday paradox’ 
(which shows us that only 23 people are needed to have the odds of at least 
two of them having the same birthday in excess of even). Theodore Hill of 
the Georgia Institute of Technology refers to another when he has his students 
choose between tossing a fair coin 200 times or faking the results. It is natural 
for the fakers to mix up the sequence of heads and tails as much as possible but, 
as he points out, ‘the overwhelming odds are that at some point in a series of 
200 tosses, either heads or tails will come up six or more times in a row’. 

Many sets of numbers certainly do not obey Benford’s Law: random num- 
bers at one extreme and numbers that are governed by some other statistical 
distribution on the other, perhaps Uniform or Normal. It seems that for data to 
conform to the law they need just the right amount of structure. The last row of 
averages of the data in the Benford table, with its excellent fit to the law, reveals 
some of the mystery and it was Hill who saw into it. In 1996 he showed that if 
distributions are selected at random and random samples are taken from each 
of these distributions, the significant-digit frequencies of the combined sam- 
ple would converge to conform to Benford’s Law, even though the individual 
distributions selected may not. Hill calls it the ‘random samples from random 
distributions’. In a sense, Benford’s Law is the distribution of distributions! 

There are other ways of approaching the phenomenon. If such a law is to be 
universal, it must for example apply to the base 5 system of counting of the 
Arawaks of North America, the base 20 system of the Tamanas of the Orinoco 
and to the Babylonians with their base 60, as well as to the exotic Basque system, 
which uses base 10 up to 19, base 20 from 20 to 99 and then reverts to base 10. 


147 



oo 


Table 14.2. Benford’s data. 


Title 

1 

2 

3 

First digit 

4 5 

6 

7 

8 

9 

Samples 

Rivers, area 

31.0 

16.4 

10.7 

11.3 

7.2 

8.6 

5.5 

4.2 

5.1 

335 

Population 

33.9 

20.4 

14.2 

8.1 

7.2 

6.2 

4.1 

3.7 

2.2 

3259 

Physical constants 

41.3 

14.4 

4.8 

8.6 

10.6 

5.8 

1.0 

2.9 

10.6 

104 

Numbers from newspaper articles 

30.0 

18.0 

12.0 

10.0 

8.0 

6.0 

6.0 

5.0 

5.0 

100 

Specific heat 

24.0 

18.4 

16.2 

14.6 

10.6 

4.1 

3.2 

4.8 

4.1 

1389 

Pressure 

29.6 

18.3 

12.8 

9.8 

8.3 

6.4 

5.7 

4.4 

4.7 

703 

H.P. lost 

30.0 

18.4 

11.9 

10.8 

8.1 

7.0 

5.1 

5.1 

3.6 

690 

Molecular weight 

26.7 

25.2 

15.4 

10.8 

6.7 

5.1 

4.1 

2.8 

3.2 

1800 

Drainage 

27.1 

23.9 

13.8 

12.6 

8.2 

5.0 

5.0 

2.5 

1.9 

159 

Atomic weight 

47.2 

18.7 

5.5 

4.4 

6.6 

4.4 

3.3 

4.4 

5.5 

91 

n _1 , n 1 / 2 

25.7 

20.3 

9.7 

6.8 

6.6 

6.8 

7.2 

8.0 

8.9 

5000 

Design 

26.8 

14.8 

14.3 

7.5 

8.3 

8.4 

7.0 

7.3 

5.6 

560 

‘Readers digest" data 

33.4 

18.5 

12.4 

7.5 

7.1 

6.5 

5.5 

4.9 

4.2 

308 

Cost data 

32.4 

18.8 

10.1 

10.1 

9.8 

5.5 

4.7 

5.5 

3.1 

741 

X-ray volts 

27.9 

17.5 

14.4 

9.0 

8.1 

7.4 

5.1 

5.8 

4.8 

707 

American League 

32.7 

17.6 

12.6 

9.8 

7.4 

6.4 

4.9 

5.6 

3.0 

1458 

Blackbody 

31.0 

17.3 

14.1 

8.7 

6.6 

7.0 

5.2 

4.7 

5.4 

1165 

Addresses 

28.9 

19.2 

12.6 

8.8 

8.5 

6.4 

5.6 

5.0 

5.0 

342 

Mathematical constants 

25.3 

16.0 

12.0 

10.0 

8.5 

8.8 

6.8 

7.1 

5.5 

900 

Death rate 

27.0 

18.6 

15.7 

9.4 

6.7 

6.5 

7.2 

4.8 

4.1 

418 

Average 

30.6 

18.5 

12.4 

9.4 

8.0 

6.4 

5.1 

4.9 

4.7 

1011 

Probable Error 

(+ve/— ve) 

0.8 

0.4 

0.4 

0.3 

0.2 

0.2 

0.2 

0.2 

0.3 
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The law must be base independent. And indeed it is, since base independence 
of data has been shown to imply Benford’s Law. 

The units of measurement should not matter either. For example, the fast- 
disappearing British Imperial system of measurement of length and mass is 


12 inches = 1 foot, 

3 feet = 1 yard, 

5 1 yards = 1 pole (or rod, or perch), 
4 poles = 1 chain, 


16 ounces = 1 pound, 

14 pounds = 1 stone, 

2 stones = 1 quarter, 

4 quarters = 1 hundredweight, 


10 chains = 1 furlong, 
8 furlongs = 1 mile. 


20 hundredweights = 1 ton, 


(Incidentally, these are nothing more than examples of a finite mixed-base mea- 
suring system, as discussed on p. 99. For example, with the length data, sup- 
pose that we have the imperial distance of 7 miles, 5 furlongs, 3 chains, 1 pole, 
2 yards, 1 foot and 1 1 inches. In miles this is the expression 


1 


1 

x 1 

10 

1 1 

— x - 
10 4 

1 


5 2 


1 

i 

1 

+ 2 

1 

1 

1 

1 

— 

X X 

— 

x - 

x — 

X - X 

5 I 

8 

10 

4 

8 

10 

4 


1 


1 

1 

1 

1 

1 

X 

- + 11 

X 

- X 

— 

X - X 

5- X 
J 2 

- X 


3 


8 

10 

4 

3 


1 

12 


— 7 + — ( 5 + — I 3 + — I 1 H r 


10 


5 2 


2+ 3 


1 + —( 11 ) 
12 V 7 


= [7; 5, 3, 1,2, 1, 11] = 7.6672 miles.) 


Euler’s manuscript, ‘Meditations upon experiments made recently on the firing 
of a canon’, concerned a series of seven experiments carried out in 1727 and 
which forever cast the letter e for the base of the natural logarithm; in it he 
measured the cannon ball’s diameter in ‘scruples of Rhenish feet’. Surely the 
same cannon balls would or would not conform to Benford’s Law whether their 
diameters be measured by the English Imperial system, Euler, or our modern 
metric system or indeed by any other system of measurement. The same point 
can be made for their masses too. In 1961, Roger Pinkham, a mathematician 
then at Rutgers University in New Brunswick, proved just that: scale invariance 
did imply Benford’s Law. It is this fact that we will focus on and show how 
such a result can be established. 

A change of units is achieved by multiplying by some scaling number and 
before we immerse ourselves in the mathematics, we can get a feel for the 
phenomenon by seeing what happens when we do just that in a particular case. 
Suppose that we take a hypothetical set of 100 ‘canon balls’ of diameters 1-100 
scruples of Rhenish feet, order them descendingly by size and plot order against 
diameter, to arrive at Figure 14.8(a). Now we change units by multiplying each 
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20 40 60 80 100 


80 

60 

40 

20 


30 

20 

10 


(b) 


0 




20 40 60 80 100 

(d) 


20 40 60 80 100 


Figure 14.8. The effect of scaling. 



diameter by a random number (in this case, between 0 and 1 ), and then again, 
and once more, re-ordering to get Figure 14.8(b)-(d). 

The same shapes result from any scalings and the concavity of the resulting 
curves forces bigger numbers to become more rare. The eye encourages the 
thought that the plots are approximating some limiting curve. Which curve? 
Figure 14.9 is a scaled plot of log 10 (l + 1 /diameter) — which makes one think. 

More specifically, consider first significant digits, uniformly distributed, and 
then suppose that we change the units by multiplying by 2. The first significant 
digits of the data after the rescaling are given in Table 14.3, which gives rise to 
the bar chart in Figure 14.10. Equally likely digits are not scale invariant. 
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Table 14.3. 


Effect of multiplication by 2 

Interval 

[1.1.5) 

[1.5,2) 

[2, 2.5) 

[2.5,3) [3,3.5) 

First significant digit after x2 

2 

3 

4 

5 6 

Interval 

[3.5,4) 

[4, 4.5) 

[4.5, 5) 

[5, 10) 

First significant digit after x 2 

7 

8 

9 

1 



□ Actual 

□ Expected 


Figure 14.10. The expected and actual frequencies 
(distribution of first significant digits). 


Now to some mathematics; we will give a statistical definition of scale invari- 
ance and use it to show that scale invariance does indeed imply Benford’s Law. 

We need the ideas of the probability density function cp(x) and the cumulative 
density function 0(x) of a continuous random variable. These definitions are 
the usual 

P(a ^ X ^ b) = / (p{x) dx, 


where <t> (x) = P(X ^ x) = / A (p(t)dt and therefore d0{x)/dx = cp(x ). 

We will say that a random variable X is scale invariant if the probabilities that 
it lies in any interval before and after scaling (i.e. multiplying) by any factor (say 
1 /a) are the same, not worrying about the details of any domains of definition. 
If we fix on a lower limit and allow the upper limit to vary, we could write this 
as 


P(a < X < x) — 


Plot < -X < 
\ a 


= P(aa < X < ax), 


which means that 


0{ax) — 0(aa) = <S> (x) — or 0 (ax) = <P(x) + K a foralla. 

If we differentiate both sides of the above identity with respect to x, we get 
acp(ax) — cp(x) and therefore cp(ax ) = (1 /a)cp(x). 


151 


CHAPTER 14 


Now let Y be the random variable Y — log* X with i jr(y) and 'P (y) defined 
analogously. Then 

V(y) = P(Y < y) = P( \og h X < y) = P(X < b y ) = &(b y ) = 0(x). 


This means that 


f(y) = j-'Pi.y) = 

dy dy 

d dx 

— — 0(x ) x — 

d.r dy 


and 


dx 

if{y) — <p(x) x — = xcpix) InZ?, 
dy 


d.r 

V 7 " (log; x) — <p(x) x — = x<p(x)lnb, 
dy 


which means that 


ijr (log 6 ax) — ax(p(ax ) In b. 

Using the scale invariance we then have 

VUlogji, ax) = ax(p(ax) InZ? 
1 

= ax—(p(x) In b 
a 

— xq>(x) In b 

= VKlog b x). 


Therefore, 

iA (log* x + log* a) = i// (log* x) 

and 

lA( y + log* a) = ty(y). 

Since a can be chosen to be anything we wish, iA(y) repeats itself over 
arbitrary intervals and it can only be that it is constant. The logarithm of a 
scale-invariant variable has a constant probability density function. 

We can now relate this to the first-digit phenomena by expressing the numbers 
in scientific notation x x 10", where 1 ^ x < 10, the first significant digit 
d of the number is simply the first digit of x. As we scale the number, we 
scale x, adjusting its value modulo 10. In this way, we can always think that 
1 ^ x < 10 whether scaled or not and if we take the base of the logarithms to be 
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Table 14.4. Second digit probabilities. 


Theoretical 
d probability 

Actual 

probability 

0 

0.1 

0.119 68 

1 

0.1 

0.113 89 

2 

0.1 

0.108 82 

3 

0.1 

0.104 33 

4 

0.1 

0.100 31 

5 

0.1 

0.09667 

6 

0.1 

0.093 37 

7 

0.1 

0.090 35 

8 

0.1 

0.087 57 

9 

0.1 

0.08499 


10, y — log j q x will have a constant probability density function of 1 defined on 
[0, 1]. Therefore, assuming the scale invariance above and for ng(l 9), 


P(d = n) = P(n ^ x < n + 1) 

= ^(log 10 » < log 10 -a < log 10 (n + 1)) 

= P( l°gio« < y < log i0 (« + 1)) 

= (log 10 (n + 1) - log 10 n) x 1 
= log| °(^) = los '„( 1 + 0 ' 

which is Benford’s Law. 

The analysis can be extended to look at the frequency of subsequent digits 
in the data. For example, if we write the number as X 1 X 2 x 10", where 10 ^ 
xiX 2 ^ 99, and define the random variable X accordingly, we get 


F(lst significant digit is xi and the second is X 2 ) 

= P(x 1 x 2 ^ X < X 1 X 2 + 1) 

= lo Sio ( 1 H )• 

V X\X2 / 

Extending the argument gives 


9 

P (second digit is X 2 ) = logio 

r = 1 


+ 


X r X2 


etc. Table 14.4 shows the full set of probabilities for the appearance of second 
digits, with 0 now a possible value. 
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Table 14.5. 


First digits of the first 1000 Fibonacci numbers 


Digit 1 2 3456789 

Frequency 301 177 125 96 80 67 56 53 45 

Percentage 30 18 13 10 8 7 6 5 5 


Using the standard result of conditional probability that 
P(A | B ) = P(A and B)/P(B ) 


we have 


P (second significant digit is xi | first significant digit is x\) 

= 1 ° gl0 ( 1 + i )/ 1Ogl0 ( 1 + ^)- 

So, for example, the probability that the second digit of a number is 5 given 
that its first digit is 6 is 


logtod + gg) 
iogioC 1 + §) 


0.0990, 


whereas if it started with 


9 the probability is 

logiod + 93) = 
logipd + 5) 


0.0994. 


The most likely start to a number turns out to be 10, with a probability of 

iogioC 1 + to) 


logiofl + y) 


= 0.1375. 


Having made an appearance, 0 is the most common second digit, but the 
probabilities are beginning to level out and are nearer the uniform yL that 
intuition suggests should be the case; as we move along the digits of the number 
the distribution does approach uniformity and intuition is eventually right. 

As we have seen, all manner of diverse data conform to the law. Table 14.5 
suggests that the Fibonacci numbers would seem to. 

A study by B. Buck and A. C. Merchant of the University of Oxford and 
S. M. Perez of the University of Cape Town showed that alpha decay half-lives 
(the time it takes atomic nuclei to lose half their radioactivity by emitting alpha 
particles) obey Benford’s Law both observationally and theoretically. They also 
remarked that the same behaviour has been observed in monthly electricity bills 
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in the Solomon Islands, the street addresses of eminent American scientists, 
and the initial digits of 20 of the more important physical constants. Of much 
more practical interest, financial data seem also to conform; in fact, Benford’s 
Law can be used to test for fraudulent data in income tax returns and other 
financial reports. Mark Nigrini has made a specialization of this sort of ‘forensic 
auditing’, which is called digital analysis. He has written: 

Benford’s Law provides auditors with the expected digit frequen- 
cies in tabulated data. By examining the digit and the number fre- 
quencies, auditors can gain data insights that might be missed using 
traditional analytical procedures and sampling methods. The digit 
and number patterns could point to number invention, systematic 
frauds, data errors, or biases in the data. Research is currently 
underway on advanced tests to detect anomalies in data subsets. 

One case in which he was involved illustrates his point. Using digital analysis, 
a company’s audit director discovered something odd about the claims being 
made by the supervisor of the company’s healthcare department. The first two 
digits of the healthcare payments were checked for conformity with Benford’s 
Law, and this revealed a spike in numbers beginning with the digits 65. An 
audit showed 13 fraudulent cheques for between $6500 and $6599 related to 
fraudulent heart surgery claims processed by the supervisor, with the cheque 
ending up in her hands. The analysis also uncovered other fraudulent claims 
worth around $1 million in total. 

This novel and important accounting technique has, of course, heralded Web 
sites devoted to the production of Benford-compliant data, not for illegal or 
immoral use, naturally! 

14.3 Continued-Fraction Behaviour 

A look back at the continued fractions in Chapter 1 1 might bring to the reader 
the thought that 1 appears a great deal in the continued fraction form of a 
number and that, on the whole, the partial quotients are small (although by no 
means exclusively so, with the 43 1 st of 7t being 20 776 the 5040th of y 1 1 626 
and the mere 5th of 7r 4 16539); Gauss noticed this too and went much further 
when he wrote to Laplace on 30 January 18 12 about a ‘curious problem’ that had 
occupied him for 12 years and which he was unable to resolve to his satisfaction. 
We will take the reader through what must have been the equivalent of Gauss’s 
reasoning, which led to one of the most remarkable results it is possible to 
imagine. 

Suppose that A is a random variable defined on 1R + and that we write {A} for 
the fractional part of A. If the fractional part of A is uniformly distributed, then 
P ({ A } < x) = x, 0 ^ x < 1, but suppose that it is not, then this probability 
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will vary according to the value of X and we would have to divide up the real 
line, as in Figure 14.1 1, to get 

OO 

P({X} <x) = < k + x) - P(X < k)). 

k=l 


All well and good, but now we apply this idea to continued fractions. 
Define by 


1 

Hn — Gn ~F j 

fl«+ 1 H ; 

On+2 + ' ' ' 


1 

— a n + T , 

Sn+1 


in which case, l/^„+i is the fractional part of Now let 


co„(x) = < x) 


OO 

= (P(tn <k + x)-P^ n < k)) 
k=l 



and we have a recurrence relation for to n {x)\ the question is, can we find an 
explicit formula? An intuitive way forward is to argue that, since the relation 
holds for all n, if the limit a>(x) exists as n — > oo, we can reasonably hope for 
it to satisfy 

and, remembering that a>(x) is the limit of the probability of a fraction being 
less than x, it should be that &>(0) = 0 and o> ( I ) = 1, which is where mortals 
might leave the matter. 
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X 

1 2 k k+ 1 

Figure 14.11. 


Gauss mentioned in his letter to Laplace that he ‘could prove by a very simple 
argument’ that w(x) = log 2 (l + x), which brings us to the promised surprising 
appearance of logarithms. Of course, this does satisfy the two conditions and 
we will show, as no doubt he did, that it does satisfy the recurrence relation 
also, but what mysterious thought process he used to arrive at the solution is 
hard to imagine. 

So, if ®(x) = log 2 (l + x), 


N 




jfc=l 


k + 1 k + x 
x 


k + x + 1 


N 


1 FT ^ 1 k+X 

= lo §2 1 I —r- x , , i t 

, , k k + X + 1 

-J? 


z i + xW;__ zn^c\/>r 3^rx\ 
1 X 2^)\? X 3^c)\? X iMrfJ 


iv + 1 jv-rx 

X 


= log 2 


X N+x+l 
(1 + x)(N + 1) 


> log 2 (l +x). 


IV + X + 1 A— >oo 

What Gauss could not do was to forge these ideas into the statement 


P([0; a i, 02 , fl 3 , . . • , flfi] < x) = co„(x) — log 2 (l + x) + s n 

and therefore rigorously produce what might be thought of as his ‘second stat- 
istical distribution’ (although the first, the ubiquitous ‘Gaussian’ or ‘Normal’ or 
‘Error distribution’, which was used by Gauss in 1809 to analyse astronomical 
data, was used by Laplace in 1783 to investigate errors in measurement and 
came into being through the work of de Moivre, who in 1733 developed it as 
an approximation to the Binomial Distribution). 

In the end, the problem was solved independently by two mathematicians. 
In 1928 R. O. Kuzmin showed that, for almost all continued fractions, s n = 
O(q'fi'), where 0 < q < 1, and in 1929 Paul Levy (1886-1971) showed in a 
completely different way that s n = 0(q"), where q — 0.7 and we have error 
terms that are not only relatively small but asymptotically zero. 
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Table 14.6. Partial quotient distribution for almost all continued fractions. 


For large n, % probabilities for partial quotients 


k 12345678 9 + 

P(a n = k) 41 17 9 6 4 3 2 2 16 


From this incredible result we can find another: the probability density func- 
tion of the partial quotients 

P(a n = k) = P(k < % n < k + 1) = P(Mn <&+!)— P(Mn < k ) 


1 

a>n - 1 1 7 | — CO „- 1 

. k 


1 


k+ 1 


» log 2 1 + 7 - log 2 1 


1 


= l«g 2 


(k+l) 2 


k(k + 2) 


= log 2 


k+ 1 
k(k + 2) + 1 
k(k + 2) 


= lo §2 1 + 


1 


k(k + 2) / 

which gives rise to Table 14.6. We can check that it is indeed a probability 
density function: 

N 


Y lo & 1 + 


k=t 


1 


k(k + 2) 

" 7 (k + l) 2 


= Y l() g2 


k= 1 
N 


k(k + 2), 

^{21og 2 (k+ 1) - log 2 k - log 2 (k + 2)} 


k= 1 

N N 

Y { lo §2 (k + 1) - log 2 k} + Y {log 2 (k + 1) - log 2 {k + 2)} 
k= 1 k=l 

log 2 (AA + 1) + log 2 2 - log 2 (N + 2) 

'N + V 


log 2 2 + log 2 


N + 2 J N- 


-> log 2 2 = 1, 


with the terms of the two series cancelling. 

For example, this tells us that in the approximation for y 


P(a n = 11 626) = logo 1 + 


1 


1 1 626 x 1 1 628 


1(T 
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Table 14.7. 



Frequency of digits in 

1000 partial quotients of y 

k 

12 3 4 

5 6 7 8 9+ 

a n 

417 168 75 57 

41 33 22 19 168 

Actual (%) 

42 17 8 6 

4 3 2 2 17 


Table 14.7 provides ample evidence that y behaves as ‘almost any’ number, 
yet e must be exceptional since 1 is the only odd number appearing in the 
continued-fraction expansion and every even number appears once and only 
once; evidently, the Golden Ratio tp is exceptional too. 

Now that we have a probability distribution, it is natural to ask what is the 
average of the a n — and here is another surprise: there isn’t one, as we can see 
from the following argument. By definition, the average value is 


Y. kP(a„ = k) > Y k lo S 2 ( 1 + 


k= 1 


k= 1 


1 


k(k + 2) 


which seems fine, but as k becomes large, k(k + 2) ss k 2 and 

1082 (' + IrtTT)) “ 1082 (‘ + ?) = ^ 0 + p) ~ 

which makes 

OO ^ OO J J oo 

E tp< “" = 8) “ E tx p = i^ E 

k,n large k,n large k,n large 


i i 

In 2 k 2 ’ 


1 

k 


and the divergent harmonic series makes another surprising (and unwelcome) 
appearance. Of course, this analysis does not work for (p and e, although it is 
obvious that the average convergent for is 1. It is undefined for e, as we can 
see if we reason that adding the convergents means adding pairs of Is, which 
is linear in n, and the arithmetic series 2 + 4 + 6 + • • • , which is quadratic in 
n ; division by n will leave something of the order of n and be divergent. 

Even though the arithmetic mean is not properly defined for the a n , Aleksandr 
Khinchin (whom we mentioned earlier on p. 140) proved that the geometric 
mean does converge, and that for almost all numbers (c/icm'a • • • a,,) 1 /" — »■ 
k — 2.685 45 ... , which is appropriately known as Khinchin’s constant; the 
plots in Figure 14.12 suggest that y, n and k itself obey Khinchin’s law. 

The geometric mean for <p is obviously 1 and for e it is undefined, which 
can be seen using Stirling’s approximation, which we developed in Chapter 10. 
Recall that to a first approximation it states that n\ ~ s/2rcnn n e~ n . 
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Figure 14.12. The tendency to Khinchin’s constant. 

An examination of the pattern in the continued fraction form of e shows that 
3 / 2—1 3/2 3/2 + 1 

n«=n«= n = 2 " n ^ 

k= 1 k= 1 ir=l 

so if A = 3 n, 

= (2 ^(iiV)!) 1 ^ « (2^ 3 y27rpV(IiV) w / 3 e - A, / 3 ) 1/JV 

V=t ' 

= (V 2 ^) W (iiV) 1 /( 2 iV) (^) 1 / 3 2 1/3 e - 1/3 

» 1 x 1 x ( — \ A 1/3 = 0.6259 • • • A 1/3 , 

N— >oo \3e J 

which diverges to oo. 

The Khinchin result can be pushed a little further if we recall the use of the 
harmonic series in measuring the independence of record events, as discussed 
on p. 125. With almost all continued fractions the geometric means of the a n 
will fluctuate around and home in on k and it makes sense to record the n for 
which the geometric mean of the a n are the ‘best yet’ in approximating k\ for 
example, with k itself the sequence starts 

1,2,3, 15,23,26,81, 104, 109, 111, 120, 127, 135, 136, 141, 142, 

144, 145, 146, 147, 148, 5920, 5943, 8381, 8401, 89 953, 91 368, 

So, over 91 368 convergents we have 27 records and Hgi 368 = 12; the same 
calculations for tc show that there are 27 records up to 4497 058 convergents 
and 7/4 497 058 = 16, which suggests an unsurprising dependence among the 
convergents in both cases. 

If we recall a definition of the statistical independence of two events A and 
B is P(A and B) = P(A) x P(B), we can quantify this suspicion since, using 
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Figure 14.13. Another Khinchin constant. 


the distribution above, it can be shown that the partial quotients are ‘weakly 
dependent’ in that 

P(a n = r and a n+k — s) = P(a n = r) x P(a„+k = s) X (1 + 0(q k )), 


where 0 < q < 1 . 

The curious 2.685 45 . . . that is Khinchin’s constant is in fact 


oo / i \ In r/ In 2 

n 1+ ^ • 

r— 1 


which Khinchin identified by proving the general result that, if f(r) is a suffi- 
ciently well-behaved function defined on positive integers, then 


-J2f(ct r ) 

11 r= 1 


>• 

tl — MX) 


1 

ln2 


E /('■) 1,1 

r= 1 



1 


r(r + 2) 


His constant results from taking fir) — In r. Of course, all manner of choices 
of f(r) are available and picking f (r) — 1/r, generalizing the harmonic mean 
from p. 121 and rewriting gives 




n In 2 

L"=l(lM-) E“id/'-)ln(l + l/r(r + 2)) 

1.745 405 68... 


and we have the harmonic mean of almost all continued fractions also converg- 
ing to a limit independent of the fraction itself, as we can see in Figure 14.13. 
In this case the limit appears to have no name attached to it; perhaps we should 
call it Khinchin’s second constant. 
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CHAPTER FIFTEEN 


Problems with Primes 


Mathematicians have tried in vain to this day to discover some order in the 
sequence of prime numbers, and we have reason to believe that it is a mystery 
into which the mind will never penetrate. 

Leonhard Euler 


15.1 Some Hard Questions about Primes 

Prime numbers have appeared several times in this book. Their study has long 
held centre stage in number theory and their behaviour, at times seemingly 
so undisciplined, can sometimes appear determined by an unknown, powerful 
authority unwilling to disclose its design. The leading quotation makes evident 
the great Euler’s frustration; Erdos, paraphrasing Einstein, said ‘God may not 
play dice with the Universe, but there’s something strange going on with the 
prime numbers!’ and R. C. Vaughan spoke for many when he said, ‘It is evident 
that the primes are randomly distributed but, unfortunately, we don’t know 
what random means.’ Three among so very many quotations made across the 
centuries which together encapsulate the wonder in which the behaviour of 
primes is held. 

Of all the questions that can be asked, perhaps the three most fundamental 
are the following. 

(1) Is a given number prime? 

(2) How many primes are there less than or equal to a given number xl 

(3) What is the vth prime, p x 7 

They are easily answered for small numbers: 101 is prime, the 50th prime 
is 229 and there are 1229 primes less than 10 000 but the going gets much 
tougher as the numbers get bigger and, after all, we know that there is an infinity 
of primes. Is 252097 800623 prime? How many primes are there less than 
100 000 000 000 000 000 000? What is the 1 000 000 000 000 000 000th prime? 
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These questions are not nearly so straightforward to answer — and these are still 
‘small’ numbers. 

We will not dwell on the first question, but the reader will need little con- 
vincing that the methods to test for primeness of large numbers are far more 
subtle than trying to divide the candidate by all primes less than its square 
root. The question is linked to finding the largest known prime, a search that 
is inevitably focused on Mersenne primes, mentioned on p. 116, named after 
the 16th century monk Marin Mersenne and which are of the form 2 P — 1 with 
p prime, since for such candidates something called the Lucas-Lehmer test 
is available. On 5 December 2001 the Great Internet Mersenne Prime Search 
(GIMPS) initiative found the latest such monster: 2 13466917 — 1 is prime. The 
number has 4 053 946 digits! 

To approach the second question — and, it will turn out, the third too — we will 
adopt the standard notation n(x) for the function which gives the number of 
primes less than or equal to x, which is known as the ‘prime counting function’ ; 
so, remembering that 2 is prime and 1 is not, tt(3) = 2, 7t( 17) = 7 and 
7r(22) = 8, etc. Clearly, tx(x) is an increasing step-function of x and since 
there is an infinite number of primes, we know that tt(x) — > oo as x — >• oo, 
but how quickly? The identification of the precise nature of ic(x) has become 
known as the Prime Number Theorem and through it we will see how intimately 
the primes are linked to logarithms and how very remarkable that fact is. In the 
words of L. J. Goldstein, 

The history of the Prime Number Theorem provides a beautiful 
example of the way in which great ideas develop and interrelate, 
feeding upon one another ultimately to yield a coherent theory 
which rather completely explains observed phenomena. 


15.2 A Modest Start 

A closer look at Euclid’s argument proving the infinity of primes allows us a 
first (and very poor) lower bound on the size of jr (x) . Although we used the first 
n primes in the original argument on p. 28, it is clear that P n = 1 + p\p 2 • • • Pn 
can be constructed from any set of n primes and of course may or may not 
itself be prime; whatever the case, let p n +\ be the smallest prime dividing 
P n , then p n+ 1 ^ P n — 1 + PIP2 ■ ■ ■ Pn < 2/71/72 • • • Pn, a huge and costly 
overestimate. Now suppose that we take /7i = 2, then p 2 ^ 2p\ — 2 x 2 = 2 2 , 
Pi ^ 2/71/72 = 2 x 2 x 2 2 = 2 4 , /74 ^ 2 / 71 / 72/73 = 2 x 2 x 2 2 x 2 4 = 2 8 and in 
general /3„ + i ^ 2 2 , which is an estimate for the size of the /?th prime. Since for 
allk = 1,2,..., n, pk < p„+i, it must be that pi, p 2 , P3, . . . , p n , /7„ + i ^ 2? . 
This means that jr(2 2 ) ^ n + 1. Now write x = 2 2 and so n = log 2 log 2 x 
to get 7r(x) Js log 2 log 2 x + 1 > log, log, x. Clearly, this inequality will also 


164 



PROBLEMS WITH PRIMES 


hold for all x ^ 2 2 " and we have the hound tt(x) > log 2 log 2 x and a first, early 
appearance of logarithms. 

We can improve matters with a bit more work. 

Factorials and the Floor function can be used to count the contribution to 
n\ of each of its prime factors, which in turn has deeper implications, as we 
will see later. To get an idea of what is happening, consider, for example, 
10! = 3 628 800 = 2 8 x 3 4 x 5 2 x 7; 2 appears 8 times, 3 appears 4 times, etc., 
and, of course, in theory we can factor specific higher factorials and answer the 
same question, but it is neater and far more practical to consider the general 
case. In the preliminaries to the co-prime proof in Chapter 8 we noted that there 
are x — \_N /r\ numbers up to and including N which have r as a divisor. So, for 
a given prime p < n, there are | n / p\ integers up to and including n, which are 
divisible by p and therefore p appears in n! precisely \_n/ p\ times. Similarly, 
p 2 appears in n! precisely [n/ p 2 \ times, p 3 appears \n/ p 2 \ times and so on to 
p k appears \n/p k \ times, where p k+] > n. The total exponent of p in n\ can 
then be conveniently expressed as 


e p (n\) = ^ 

r= 1 



where the terms of the seemingly infinite series are zero for r ^ k + 1 . 
This means that 

n\=Yl P M ' ,!) = n P^= lLn/pr ^ 

p^n P^n 


a result attributed to Legendre, whom we saw contribute to the theory of the 
Gamma function and whom we will meet again later in the chapter. 

It is this expression that we will use to estimate n(x), but before we do we will 
take a quick look at its contribution to the solution of a well-known problem, 
since there is no added cost in doing so: how many zeros end a given factorial? 
For example, we see from above that 10! ends with just two zeros. To answer 
this in a systematic way we can use the above result to establish how many 
times 2 and 5 each appear in 10! and then take the smaller of the two numbers 
to give the number of ways that 10 = 2x5 appears and therefore in how many 
zeros the number ends. 

We have then that 2 appears 


10 


10 


10 

_T_ 

+ 

_2?_ 

+ 

-2 1 . 


= 5 + 2+ 1 = 8 


times and 5 appears 
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Table 15.1. A comparison of the estimates. 


* 

n(x) 

log 2 log 2 X 

1 

- logT n\ 
n 

10 6 

78 498 

4.32 

18.49 

10 7 

664 579 

4.54 

21.8 

10 s 

5 761455 

4.73 

25.1 

10 9 

50 847 534 

4.90 

28.5 

10 10 

455 052 511 

5.05 

31.8 

10 11 

4118054813 

5.20 

35.1 

10 12 

37 607912018 

5.32 

38.4 

10 13 

346 065 536 839 

5.43 

41.7 


times; 10 therefore appears 2 times and 10! must end with two zeros, as we can 
see from the direct calculation above. Put to greater use, for 1000!, 2 appears 


1000 


1000 


1000 


1000 

2 

+ 

_ 2 2 _ 

+ 

_ 2 3 _ 

+ ••• + 

2 9 


= 500 + 250 + 125 + 62 + 31 + 15 + 7 + 3 + 1 = 994 


times and 5 appears 


1000 


1000 


1000 


1000 

5 

+ 

_ 5 2 _ 

+ 

_ 5 3 _ 

+ 

_ 5 4 . 


= 200 + 40 + 8 + 1 = 249 


times and so 1000! ends with 249 zeros. It is, of course, the number of times 
that 5 appears that determines the number of zeros. 

To apply Legendre’s result to estimate tc(x) we do the following. 


e p {n\) 


n 

+ 

n 

o’ 

+ 

n 

-P-. 


lp-\ 


Ip 3 a 


where the series eventually terminates. We can find an upper bound for e p (n\) 
by removing the [_ J function and allowing the resulting geometric series to 
extend to infinity to get 


n n n 

e p (n\) < — I — j-\ — t+- 
P P P' 


n ( 1 1 

-( H h — 

P V P P 


+• 


n 1 n 


which makes < p”Pp D Since for any number n ^ 2, n ^ 2" ', 

we have that p e ? (nl) < p n !^~ v > < (2 P-i) n /(P~D = 2" and«! < (2") 7r (' ,) = 
2' m (") . Taking logs to the base 2 we have that njt(n) > log 9 n\ and tc(h) > 
(1 /n) log 2 n\, our new estimate. This takes a bit of calculating for large n, but 
since we have Stirling’s approximation we can estimate well enough for our 
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Pk 


2 k + 1 = 2 x 2 k 


Figure 15.1. 


purposes by using only the first term of the approximation and taking base 2 
logarithms of each side to get 



1 

7 r(n) > - log 2 n\ « logo 
n 


n 


e 


for large n . 

We can now compile Table 15.1 to see just how bad these estimates really 
are. On the bright side, at least they are valid bounds and through these ideas 
we have exercised some small control over the distribution of primes. 

We now have two lower bounds on n (;c ) . The argument on p. 164 has already 
provided an upper bound of the size of the n th prime, and this can be significantly 
sharpened using the Bertrand Conjecture once more (mentioned on p. 25), since 
if we write the primes in ascending order as p\, pi, . . . , p„ the conjecture 
implies that p„ < 2" (of course, p\ = 2 = 2 1 , but the inequality is strict from 
then on). The easiest way to see this is to use induction, referring to Figure 15.1: 
suppose that for some k, pk < 2 k , then pk+i lies either in the interval (pk, 2 k ), 
in which case pk+i < 2. k < 2 k+l , or it lies to the right of 2 k , in which case it 
must be the first prime that is guaranteed to be in the interval (2 k , 2 k+ 1 ) and 
again it must be that pk+i < 2 fc+1 and the induction is complete. 

15.3 A Sort of Answer 

Of course, what we would like is to find an explicit expression for jt(x) in 
terms of x and if we are not too choosy, this is readily accomplished. In fact, 
there are any number of such formulae and a large class of them relies on a 
result of number theory known as Wilson’s Theorem. In 1770, the Cambridge 
mathematician Edward Waring (1741-1793) published the work Meditationes 
Algebraicae, in which he announced a number of new results of number the- 
ory; foremost among them was the statement that if p is prime, then p divides 
(p — 1)! + 1. He attributed it to his former student (and Senior Wrangler), John 
Wilson (1741-1793), who posited the result on the basis of empirical evidence. 
No proof was provided. In the publication, Waring admitted to failure in sup- 
plying the proof, adding in the text. Theorems of this kind will be very hard to 
prove, because of the absence of a notation to express prime numbers’, a com- 
ment which failed to impress the great Gauss, who, on reading it, is said to have 
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uttered 'notationes versus notiones’, implying that it was the notion that really 
mattered, not the notation. In fact, it took only until 1773 for Lagrange to provide 
the proof of statement (and of its inverse), yet it has passed into mathematical 
lore as Wilson’s Theorem; another example of mathematical serendipity. It is 
even possible that it should carry the name of the mathematical giant Leibniz, 
as in his unpublished posthumous papers there are calculations closely related 
to the idea. 

Assuming the truth of Wilson’s Theorem, we can give some sort of answers to 
the last two questions and do so by referring to an article by C. P. Willans in the 
December 1964 issue of the Mathematical Association’s journal Mathematical 
Gazette, which caused a little flurry of conflicting correspondence over the 
following three years and for that reason alone deserves our attention. 

We have, as a direct consequence of Wilson’s Theorem, the function 


F(n) = 


COS 7T 


(w- 1)!+ 1 

n 


2 


I I , n = 1 or n prime, 
0, otherwise, 


COS TV 


(n - 1 )! + 1 


and, consequently, 

n(x) — -i + 

n = 1 

To answer the third question, define the function 
- 4/1 ( a ) = 


1 a 


n — 1,2, , a = 0, 1, 2 


Since, for a < n, 1 ^ n/(l + a) ^ n we have that 1 < y/n/l I + a) < 
iyji < 2 and so 1 ^ A n (a) ^ 1, which of course forces A„(a) = 1. Similarly, 
for a ^ n, 0 < n/( 1 + a) < 1 and so 0 ^ A n (a) ^ 0, which forces A n (a) = 0. 
In summary, then 


An (tt) 


1, a < n, 
0, a ^ n. 


We can therefore construct the formula 


N 

Px = 1 + A *( jr( r))’ 

r= 1 


where N is any sufficiently large integer. We could conveniently take N — 2 X 
since p x ^ 2 X for all x. The final formula is a typesetter’s nightmare when 
written in full. 
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and it can be quite mysterious to see it at work. For example, Willans gives 

P5 = 1 + A 5 (jt(1)) + A 5 (jt(2)) + A 5 (jt(3)) + • • • + A 5 (tt(32)) 

= 1 + A 5 (0) + A 5 (1) + A 5 (2) + ... + A 5 (ll) 

= 1 + 1 + 1 + ... + 0 = 11 . 

There are other formulae like them, including another in the same article by 
Willans not involving |_ J . The results are novel but it is hard not to feel that 
this is not really answering the question in the proper spirit, and, anyway, the 
formulae (and all others derived using the same sort of ideas) are in practice 
useless for the job for which they are intended. 


15.4 Picture the Problem 


More realistically, the original question 2 is asking whether an approximation 
to 7t(x) can be found in the form jt(x) = f(x) + s x for some easily computable 
function f(x) and absolute error term s x , which we hope not to be too big, and 
which diminishes asymptotically. To be more precise, we want of the relative 
error 


lim 

x — > oo 


n(x) - f(x) 
TC(x) 


lim = 0. 

A' -3-00 Jt(x) 


So what is this /( jc) ? If we look at the graph of tt(x) for small x , we see an 
erratic step function that can do little to boost our confidence in finding it (see 
Figure 15.2). If we increase the range to 0 ^ x ^ 100, the stepped effect is still 
evident but so is some sort of trend (see Figure 15.3). And for 0 ^ x ^ 1000, 
the trend becomes clearer (see Figure 15.4). Finally, for 0 ^ x ^ 5000, we get 
what appears to the eye near to a straight line; it isn’t, of course (see Figure 15.5). 

In fact, the curve which the eye superimposes on the graph is concave down- 
wards since, although there is an infinite number of primes, they do become more 
rare as x increases. The stepped effect is still there, it is simply hidden, and since 
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there are arbitrary distances between primes the ‘run’ of the steps can be arbitrar- 
ily large for a ‘rise’ of 1 . The easiest way to convince oneself of this is to realize 
that for any positive integer n, the sequence n \ + 2, n\ + 3, n \ + 4, . . . , n \ + n 
contains no prime. 
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In 1975, in his inaugural lecture at the university of Bonn, Don Zagier com- 
mented: 

There are two facts about the distribution of prime numbers of 
which I hope to convince you so overwhelmingly that they will 
be permanently engraved in your hearts. The first is that, despite 
their simple definition and role as the building blocks of the natural 
numbers, the prime numbers grow like weeds among the natural 
numbers, seeming to obey no other law than that of chance, and 
nobody can predict where the next one will sprout. The second fact 
is even more astonishing, for it states just the opposite: that the 
prime numbers exhibit stunning regularity, that there are laws gov- 
erning their behaviour, and that they obey these laws with almost 
military precision. 

What are these laws that govern the primes’ behaviour? In particular, what is 
that /(*)? 

15.5 The Sieve of Eratosthenes 

The Greek scholar, Eratosthenes (276-194 b.c.), was a renowned chronicler of 
history. He was also chief librarian of the great library of Alexandria and mea- 
sured the distance along the meridian from there to Assuan, which allowed the 
size of the Earth to be calculated with remarkable precision. For the mathemati- 
cian he is remembered more for a device that methodically isolates primes; a 
device that has become known as his sieve, which allows the creation of a list of 
primes up to x by knowing the primes up to ■ s /x, and without a single division. 

To use it, we write down all of the integers up to x and then repeatedly cross 
out every second, third, fifth, etc., integer beyond the first appearance of each for 
each prime ^*Jx; the remaining uncrossed integers are the primes. Of course, 
using this new set the whole process can be repeated to find the primes between 
x and x 2 , x 2 and x 4 , etc. For example, with x = 50 and using the primes 2, 3, 
5 and 7 we have Figure 15.6. 

Since this isolates the primes, it is small surprise that it can be used to calculate 
7t(x) and Daniel Meissel (1826-1895) used it (actually, a refinement of it) to 
do just that. We mentioned him before on p. 64 and in 1870 he hugely increased 
the contemporary knowledge by showing that 7r(10 8 ) = 5 761 455. In 1885 he 
increased this to 7r(10 9 ) = 50 847478, which was, unfortunately, 56 short of 
the correct number. 

It is interesting to see how the process can be formalized and so realistically 
begin to deal with large numbers and once again we will use the Floor function 
and the inclusion-exclusion principle. 

Suppose that we fix on an integer x and that the list of primes up to */x is 
2, 3, 5, ... , p x . Now modify the process by crossing out the prime as well as 
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Figure 

15.6. 

Sieve of Eratosthenes. 



its multiples. The first sieving by 2 then crosses out [^xj numbers and we are 
left with .v — LjV'J of them. The second sieving by 3 crosses out all multiples of 
3, but it will come across multiples of 6, which have already been eliminated, 
so we will have remaining x — [_\x\ — [j.rJ + \_x/(2 x 3)J. 

The reasoning continues for 5, where we have to compensate for multiples of 
2x3x5 having been subtracted once too often; it is really a direct application 
of inclusion-exclusion. This leaves 


* - \.\x\ - Ljxj - Ljxj + 


X 

+ 

X 

+ 

X 


X 

_2 x 3_ 

_2 x 5_ 

_3 x 5_ 


2x3x5 


numbers. 

And so it continues to the prime p x . We are left with the number 1 and all 
primes between */x and x, that is, tt(x) — tt(^/x) + 1 numbers and so we have 


n(x) — jt(vT) + 1 

= X- l\xj - LyxJ - Lj-rJ 

+ 


X 

+ 

X 

+ 

X 


X 

_2 x 3_ 

_2 x 5_ 

_3 x 5_ 


2x3x5 


with the dots indicating the extension described above. 

It is instructive to apply the formula for, say, x = 100 to get tt(100) = 25. 


15.6 Heuristics 


We get further by being vaguer. Let’s not worry about the Floor function and the 

duplication and say that about half of the numbers will be divisible by 2 and so 

we are left with (1 — l)x after the first round of crossing out. About one-third 
^ 1 1 

of those will be divisible by 3, leaving (1 — 4)(1 — i)x. About one-fifth of 

1 J | Z | 

those will be divisible by 5, leaving (1 — j)(l — ^)(1 — j)x, etc. If we repeat 
this for all of the primes ^s/x, we will have approximately 


n 



X 
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integers remaining, making 



The error is building and we could do more to keep track of its size, but that 
would lead us away from the directions in which we wish to travel. 

Along one road, recall one of the two Mertens product formulae from p. 109: 


1 

lim 

n-M> o In n 


n 



= e 


v 


We can avoid using the limit notation and reorganize the result to the form 


n 




In n 


for large n . With n — *Jx this gives the estimate 


tt(x) 


= 2e“> / — 

In v x In x 


and the all-important expression xj In x has made its first appearance. 

Choosing a second (even more bumpy) road, imagine 7r(x) being differ- 
entiable for very large x, or approximated accurately by that smooth curve 
suggested by Figure 15.5, which we will call by the same name, then from 
above, 



Now let h be the average interval between primes around *Jx, then, by the 
definition of tangent, 7t'(^/x) l//i. The expression (y/x + h) 2 is near to x 

and we will use the approximation 



where we are approximating the greatest prime less than (y/x + h) by x , 
which isn’t so very terrible for large x. 

Now use Taylor’s approximation to give 

7t'((s/x + h f) = 7x'(x + 2 h^/x + h 2 ) sa Tt'(x) + 2 h^fxjt" (x). 


Equate these two and simplify to the horrendous differential equation: 

7t"(x) . j— 

2x— — - + jr (Vx) = 0. 

7t’{x) 
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Fortunately, we have a hint already; let us try n(x) = x/lnx. The first term 
becomes 

2(2- In x) 

In * (In x — 1 ) 


and the second 


2(2 — In .r) 
(lnx) 2 


which differ in magnitude, with lnx replacing (In x — 1). 

The arguments are hardly incapable of criticism but as heuristics they are 
fine. They have done what is needed of them, which is to point in the right 
direction for progress. That function x / In x does seem to be intimately linked 
with tt(x). 


15.7 A Letter 

On Christmas Eve 1849, the 72-year-old Gauss wrote a letter to his ‘distin- 
guished friend’ and former student, the astronomer, Johann Encke (1791-1865). 
The letter was in response to one from Encke, in which he had shown his own 
interest in the frequency of the primes and had posited his own estimate for 
n(x), and began, 

Your remarks concerning the frequency of primes were of interest 
to me in more ways than one. You have reminded me of my own 
endeavours in this field which began in the very distant past, in 
1792 or 1793, after I had acquired the Lambert supplements to the 
logarithmic tables. 

In 1792 Gauss was 15 years old. The fortuitous gift of a table of logarithms 
and a supplement which contained tables of prime numbers up to 1 million had 
enabled the young boy to begin the assault on the nature of jr(x) (compiled 
by the German-Swiss mathematician Johann Lambert (1728-1777); his name 
appeared on p. 93 in connection with the theory of continued fractions). Later 
Gauss would have access to tables of primes up to 3 million. Table 15.2 shows 
the initial information that the 15-year-old had to work with and on the basis of 
this very limited evidence it occurred to him that the pattern that was emerging 
was that for x = 10", 

1 1 

7 t(x) X X = X X, 

a x n ax log| 0 x 

where a seems to be a number just over 2 — and well he knew that In 10 = 
2.30. . . . The standard laws of logs then produce 7t(x) x/lnx, in keeping 
with those other heuristic pointers. This gives f(x) = G(x ) — x/lnx and 

X 

7t(x) = b e x = G(x) + e x . 

lnx 
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Table 15.2. 


X 

71 (x) 

Prime density 

10= 10 1 

4 

1:2.5 = 1 : (2.5 x 1) 

(N 

o 

II 

o 

o 

25 

1:4= 1:(2 x 2) 

1 000 = 10 3 

168 

1:5.96= 1:(1.99 x 3) 

10000 = 10 4 

1229 

1:8.14= 1:(2.04 x 4) 

100000 = 10 5 

9 592 

1:10.43 = 1 : (2.09 x 5) 

1000000 = 10 6 

78498 

1:12.74= 1:(2. 12 x 6) 




Figure 15.7. Gauss’s original estimate. 

In Figure 15.7 we have two plots of the early comparison between tt(x) and 
G(x). His book of logarithms still survives and has written on its back cover in 
a young hand ‘Primzahlen unter a(— oc)aj hi . 

In the letter, Gauss referred only to his refined estimate, which came about 
by localizing the count, considering the number of primes in blocks of 1000 
consecutive integers. (There is use of some delightful classical language, with 
hecatontades for 100, chiliad for 1000 and myriad used in its accurate sense of 
10000.) He wrote that he ‘frequently spent an idle quarter of an hour to count 
another chiliad here and there’, which enabled him to average over smaller 
sub-intervals rather than across the whole interval itself and in the limit ‘add 
up’ the primes by integration and so arrive at 

f{x) = Li(x)= f — — dw 

J 2 In k 

to get the estimate 

f x 1 

7t(x) = / - — d u + s x — Li(x) + s x . 

J 2 In u 

And this brings about an appearance of the logarithmic integral function Li (x ) , 
which we mentioned on p. 106 and which has become central in the study of 
the distribution of primes. Predictably, he had failed to publicly announce the 
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Table 15.3. 


X 

n(x) 

Li(x ) 

Difference 

500000 

41556 

41 606.4 

50.4 

1000000 

78 501 

78 627.5 

126.5 

1500000 

114 112 

114 263.1 

151.1 

2000000 

148 883 

149 054.8 

171.8 

2 500000 

183016 

183 245.0 

229.0 

3 000000 

216 745 

216 970.6 

225.6 



Figure 15.8. Gauss’s refined estimate. 


idea, which was finally published posthumously in 1863 and appears on p. 11, 
Vol. 10, Part I of his Werke, although he did include Table 15.3 in the letter. In 
every case the prime count is slightly wrong, with the error for the four largest 
values in favour of the Li(x ) estimate. 

If we integrate Li(x) by parts twice, we have 



x x f x 2 

"F T . d U 

In x (lnx) 2 J 2 (Inn) 3 


and a comparison between the two logarithmic estimates, which can be contin- 
ued as far as we please. 

Comparisons for the new estimate of jt(x) are shown in Figure 15.8. By 
introducing these estimates, Gauss had established a bridgehead in the battle to 
harness the behaviour of the primes, but although he worked alone he was not 
alone in the work. Part way through the letter he commented, 

I was not aware that Legendre had worked on this subject; your 
letter caused me to look in his Theorie des Nombres , and in the 
second edition I found a few pages on the subject which I must 
previously have overlooked (or, by now, forgotten). 
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Figure 15.9. Legendre’s estimate. 


He was referring to Legendre’s Essai sur la Theorie des Nombres, which origi- 
nally appeared in 1798 and in an improved second edition in 1808. The original 
volume contained the proposal that 


jt(x) 


X 

A In x + B 


for some constants A and B , which was refined in the second edition, using 
tables up to 400 000, to the somewhat mysterious 


f(x) — L(x) = 


x 

lnx — A{x) 


and therefore that 


7 r(x) = 


x 

ln.r — A(x) 


+ £x — L(x) + £ x , 


where A(x) ~ 1.083 66. A formula described by the Norwegian genius Niels 
Abel (1802-1829), in a letter written in 1823, as the ‘most remarkable in the 
whole of mathematics’. The comparisons are shown in Figure 15.9. 

The mysterious 1.083 66 . . . naturally attracted Gauss’s interest, as did the 
fact that up to 3 000000, L(x) was more accurate than his own Li(x), as we 
can see from Figure 15.10. 

In the letter he recorded the values which A(x) must take for L(x) and tt(x) 
to agree over intervals of length 500000 to get values for A(x) of 1.09040, 
1.076 82, 1.075 82, 1.075 29, 1.07179. 1.072 97. He continued, 


It appears that, with increasing x, the (average) value of A{x) 
decreases; however, I dare not conjecture whether the limit as x 
approaches infinity is 1 or a number different from 1 . 1 cannot say 
that there is any justification for expecting a very simple limiting 
value. 


If we look at the comparisons of n(x) with the case A (x ) = 1, we can see why 
Legendre would have preferred his strange 1.083 66, which must surely have 
been the result of repeatedly fiddling with the expression. It would be 70 years 
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after Legendre’s death before it was proved that, in the long term, Legendre 
was misled and Gauss was too timid, when it was shown that 1 is in fact the 
best value. 

As to the superiority of L(x) to Li (x), Gauss commented, These differences 
(between L{x) and tt(x)) are even smaller than those from the integral, but they 
seem to grow faster with x so that it is quite possible they may surpass them’ ; 
he was right, eventually they do and it took that same mathematician to prove 
the fact — but more of that later. 

Encke’s own estimate is not recorded in the letter but it is interesting to note 
that Gauss recognized its asymptotic form with. 

By the way, for large x, your formula could be considered to coin- 
cide with 

x 

In a - (1/2 k)' 

where k is the modulus of Briggs’s logarithms; that is, with Leg- 
endre’s formula, if we put A(x ) = 1/2 k — 1.1513. 

By which he seems to have meant k = log 10 e. 

In summary, we have the tabular comparison in Tables 15.4 and 15.5. 
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Table 15.4. A table of comparisons. 



Jl(x) 

G(x) 

L(x) 

Li (x) 

1000 

168 

145 

172 

178 

10000 

1229 

1086 

1231 

1246 

100000 

9 592 

8 686 

9 588 

9 630 

1000000 

78498 

72 382 

78 543 

78 628 

10000000 

664579 

620421 

665 140 

664918 

100000000 

5 761455 

5 428 681 

5 768 004 

5 762 209 

1000000000 

50847534 

48 254 942 

50917519 

50 849 235 

10000000000 

455 052 511 

434294482 

455 743 004 

455 055 614 


Table 15.5. Percentage differences compared with n(x). 


% 

%G(x) 

%L(x) 

%Li{x) 

1000 

-13.8305 

2.2027 

5.9524 

10000 

-11.6569 

0.1232 

1.3832 

100000 

-9.4465 

-0.0375 

0.3962 

1000000 

-7.7908 

0.0576 

0.1656 

10000000 

-6.6446 

0.0844 

0.0510 

100 000000 

-5.7759 

0.1137 

0.0131 

1000000000 

-5.0988 

0.1376 

0.0033 

10000000000 

-4.5617 

0.1517 

0.0007 


15.8 The Harmonic Approximation 


One last alternative expression can be extracted from the definition of the har- 
monic mean of the first x integers. Recall that this has the form 


H = 


x 

Er=l I/'' 


and that using the connection between this, In and y we have that, for large x, 
H x / (\n x — y) and another Legendre-type estimate of n (x ) . This means 
that the number of primes up to x can be approximated by the harmonic mean 
of the integers up to x and Figure 15.12 shows this comparison. 

The inequality between the harmonic and geometric means established on 
p. 1 19 for two numbers can easily be extended to give H < G for any set of 
numbers. If we consider the set to be the first x integers, this means that 


* . . < (1 x 2 x 3 x ■ ■ ■ x x) l ' x = (a!) 1 /* 

£r=l X ! r 
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Figure 15.12. The harmonic estimate. 




Figure 15.13. An upper bound on n{x). 


and, once again using the logarithmic approximation to the harmonic series and 
Stirling’s approximation from p. 87 to the factorial, we have 


x 


In x — y 


< (*/2itxx x e x ) l ! x 


(2n) l l lx x l+{ l lx 


e 


Finally, if we allow ourselves the (considerable) luxury of using the Gamma 
estimate to approximate n{x), we have an upper bound on its size, with 


tt(x) < 


(2n )V 2x x x+l ! 2x 


= nx) 


for large x. 

The graphs in Figure 15.13 show the early and slightly later stages of this 
(again poor) comparison. 


15.9 Different— and Yet the Same 


The expression jt(x) = f{x) + e x , when rewritten as 

n(x) _ e x 

f(x)~ fix)’ 
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allows us to concentrate on the asymptotic comparison of n(x) and its approx- 
imations and of course we hope for the relative error to diminish to 0 and 
therefore 


lim 

x — > oo 


n{x) 

fix) 


= 1. 


It is usual to represent such behaviour by the notation n(x) ~ f(x). 
It is perfectly clear that, if the limit exists, 


7t(x) 71 (X) 

lrm = lim 

*->oo x/ In x *->oo x/(lnx — a) 

for any constant a, which makes 


7T(x) - 

^ 

7T (x) ~ 

lnx - 1.08366’ 

lnx ’ 

7T(x) ~ 

X 

7T(x) ~ 

X 

lnx — 1 

In x — y 

equivalent statements in this sense. 




That 


n(x) ~ — — and 7 r(x) ~ / d u 

In x J 2 In u 

are also equivalent takes a bit more work, and we need the help of L’Hopital’s 
Rule. 

One way around, if we assume that 


then 


lim 

x — >oo 


tx{x) 
x/ lnx 


= 1, 


lim 


7 r(x) 


— lim 


7i(x) x/lnx 


x^°o f* (1/lnw ) du x^-oo x/\nx' f*(l/lnu)du 

x / In x 

= 1. lim — 

x^oo J 2 (1/ In m) du 

and using L’Hopital’s Rule this becomes 

(lnx — x.(l/x))/(lnx) 2 / In x — 1 

lim = lim I — . In x 

x ^oo 1/lnx x^-oo y (In x) z 

r lnx_1 . 

= lim = 1. 

x ^oo lnx 


(15.1) 


The reverse argument is the same. With this established, we can state the cele- 
brated Prime Number Theorem. 


181 



CHAPTER 15 


Prime Number Theorem 

n(x) ~ G(x) or equivalently jt(x) ~ L(x) or7r(x) ~ Li(x). 
We could, of course, add in 7t(x) ~ jc /(In x — a) for a — I oi' otherwise. 


15.10 There are Really Two Questions, Not Three 


A little work shows that the Prime Number Theorem is equivalent to estimating 
the xth prime. 

If the Prime Number Theorem is true and if the xth prime is written p x , then 
clearly Ti{p x ) = x, which intimately associates the growth of tt(x) with x and 
p x with x, and we have 

7t(x) TC{X) 

lim — — — = 1 In lim — — = In 1 = 0 


1 x / In x 


x^oo x/ Inx 

r i 71 W „ 
lim In = 0 

r— >-oo .r/lnx 

lim (ln7r(x) — lnx + In In x) — 0 

x — > oo 

( /lnjr(x) In In x \ 

lim In x ( — 1- — 1 

x^oo \ \ lnx Inx / 


Since In x is unbounded, 


lim 

r— >oo \ lnx 


lnjr(x) In lnx 


- 1=0 


and, since also 


we have that 


In In x 

lim = 0, 

lnx 


lnjr(x) 

lim = 1. 

*->oo In x 


1 = 0 . 


7r(x) ln7r(x) 7r(x)ln7r(x) 

lim x lim = lim = 1. 

■V^OO x/lnx x—>oo lnx r-»oo x 

Now replace x by the xth prime p x , then, as we have already said, n ( p x ) = x 
and the equation becomes 

x In x 

lim = 1 

x — >oo p x 


and so p x ~ x lnx. 
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To show the equivalence we now assume that p n ~ n In n and define n by 
p n ^ x < p„+ 1 . Then p n ~ n In n and p„+i ~ (« + 1) ln(n + 1) ~ n Inn for 
n large. This means that x ~ n In n . Also, tt(x) = n, so that x ~ n (x) In n (x) . 
Therefore, 


lim 

x — >oo 


7r(x) 
x/ lnx 


tt(x) lnx 

lim 

x — > oo x 


lim 


tt(x) lnx 


x tv ( x) In 7r(x) 
lnx 


lim 

.v^-oo ht7r(x) 


A more delicate argument establishes that p x ~ x(lnx + In lnx — 1) and there 
are improvements to this too. For example, these formulae predict that the one- 
millionth prime is about 13 800 000 and 15 400 000, respectively; in fact, the 
one-millionth prime is 15 485 863. In a 1967 paper Rosser and Schoenfeld also 
showed that 


x(lnx + lnlnx — 1.5) < p x < x(lnx + In lnx — 0.5) 


for x ^ 20. 


15.11 Enter Chebychev with Some Good Ideas 

So, we have several empirical formulae, essentially identical, but producing 
different errors in approximating 7r(x) — and we have a ‘theorem’ without a 
proof. The first major step forward towards achieving a proof was brought 
about by Chebychev, who used Legendre’s result (mentioned on p. 165) and 
Euler’s identity; he also added two functions to his mathematical toolkit. 

We can think of the prime counting function being defined by 

IT (x) = ^2 1. 

p< X 

p prime 

that is, a step function which increases by 1 whenever a prime is reached. 
Chebychev generalized this to a weighted prime counting function 

i/r(x) = ^2 lnp, 

p'X x 
p prime 

which increases by In p whenever a power of a prime is reached; the sum is 
interpreted to mean the sum over all primes p such that some positive power of 
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Table 15.6. Some values of ^(x). 


x 100 200 300 400 500 600 700 800 900 1000 

fix) 94.04 206.1 299.2 397.8 501.7 593.9 699.0 792.7 897.2 996.7 


the prime is less than or equal to x. For example. 


VK20) = (In 2 + In 3 + In 5 + In 7 + In 1 1 + In 13 + In 17 + In 19) 

+ (In 2 + In 3) + (In 2) + (In 2) = 19.2656 . . . 

and 

*(30) 

= (In 2 + In 3 + In 5 + In 7 + In 1 1 + In 13 + In 17 + In 19 + In 23 + In 29) 
+ (In 2 + In 3 + In 5) + (In 2 + In 3) + (In 2) = 28.4765 

where the terms are bracketed so that p < A' 1//r for r = 1, 2, 3 (A little 

thought shows that, in fact, fix) — ln(l.c.m.{l, 2, 3, ... , |_a_|}).) Chebychev 
also defined the function 9 (x) = In P and using this and the above brack- 

eting we can easily see that fix) can be written as the finite series (#(y) must 
be zero for y < 2) 

fix) = Oix) + <9(a 1/2 ) + <9(a 1/3 ) + 9ix 1/4 ) + ■■■ . 


We can also see that in the two numeric cases detailed above and in Table 15.6, 
f (x) is pretty near to a. Is this a coincidence? Not if the Prime Number Theorem 
is true, since the statement fix) ~ x is equivalent to it; in fact, we have the 

Crucial Equivalence 

nix) 9ix) fix) 

x / In x ’ x x 

have the same asymptotic limit 

and to prove that Chebychev argued in the following way, which we have taken 
from A. E. Ingham’s treatise. The Distribution of Prime Numbers and which 
we will mention again on p. 188. 

First, if p r < x, then r is the maximum value such that r < In x/ In p, that 
is, r — [In a/ In p\ . This means that 


fix) = ^ 

pfx 


In a 
In p 


In p. 
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Now write the three (possibly infinite) limits as L\, Li and L 3 , respectively. 
Then we have the double inequality 


In x 
In p 


0(x) fix) = 22 

p^x 

= lnx 22 1 = In xn(x), 

p^x 


\ — ^ 111 X 

p^x 


which means that 

9(x) fix) jt(x) 
x ' x ^ x/lnx 

and this means, taking the limit as x -> 00 , Li f L 3 f P\- 
Now suppose that 0 < a < 1 and that x > 1 . Then 

o(x) f 22 P 

x a <p^x 

f lnx“ 22 1 ~ lnx“(jr(x) — 7 r(x“)). 

x a <p^x 


Since jt(x a ) < x a we have 6(x) f lnx“(jr(x) — x a ) and 

0(x) a( 7 r(x)lnx — x a In x) 

T 

x x 

/ ;r(x) lnx \ 

\x/lnx x 1- "/ 

As x — > 00 , lnx/x 1_ “ — > 0, which leaves us with Li f aL\ and since this is 
true for a arbitrarily close to 1 , L 2 L 1 . Combine this with the first inequality 
and we have the result. 

By this means, the search for a proof that n{x) ~ x can be altered to a search 
for a proof that fix) ~ x. Using such ideas in 1852, in the first of two important 
papers, Chebychev showed that for arbitrarily large x 



ax 
In" x 


< 7 T(x) < 



d« 
In u 


ax 
In" x 


for any positive integer n and arbitrarily small a > 0 , a result which, with 
/? = 1 , he developed into 


J 2 ( 1 / In u) du 
x / In x 


a < 


nix) 

x/lnx 


< 


f) ( 1 /lnzi) dzz 
x / In x 
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and so 


/* ( 1 / In u)du 7 x(x) 

lim — a < lim 


x / In x 


^ lim 

x — > oo 


x/\nx 

/ 2 ' ( 1/ In u) dn 
x / In x 


or, using the equivalence (15.1)onp. 181, 

7 X(X) 


1 — a ^ lim 


which means that if 


>oo x / \n x 

,. n(x) 
lim 

*->oo x/ In x 


^ 1 + a. 


does exist, then it must be 1 . In the same paper he also showed that the relative 
error in the approximation of tt(x) by Li (x) is less than 1 1% for large x but his 
further attempts to show that it was asymptotically 0 failed. 

In his second paper on the subject, dated 1854, he began to close in on the 
result in that he showed for large x, 

. , 

M < —r, < M, 

xj In* 


where 0.922- ■■ < A\ < 1 and 1 < A 2 < 1 . 105 ■ ■ • . These were major 
steps forward and they formed a firm base from which to launch attacks on the 
problem, but the pathway to the ultimate goal seemed irrevocably blocked. 

Others tried. None succeeded. Not for another 100 years would a proof be 
found which is based on ‘real’ numbers. 

A new direction was taken by Dirichlet, whom we have mentioned already 
on p. 112. In essence, he generalized the definition of the Zeta functions and 
thereby brought to the mathematical world the L functions, which are a linch- 
pin of modern number theory. We will steer past this elegant and important 
initiative, but not before mentioning that in 1 837 Dirichlet used it to lay to rest 
the conjecture of Legendre that every arithmetic sequence of integers (with first 
term co-prime to the common difference) contains an infinite number of primes, 
and so produce one of the greatest achievements of 19th-century mathematics. 

Euler had originally brought analysis into number theory with his identity, 
Chebychev and Dirichlet had developed the initiative — and then came Riemann 
with a single idea announced in a single paper. 


15.12 Enter Riemann, Followed by Proof(s) 

Encke was one of Gauss’s distinguished students, Bernhard Riemann (1826— 
1866) was another. His name has already appeared in these pages but here he 


186 



PROBLEMS WITH PRIMES 


plays his most significant role in our story. Shy and introspective, his health 
never strong, he died of tuberculosis in Italy on the shores of Lake Maggiore; 
he was 40 years old and mathematically active until the end: the year of his death 
was also the year in which he was elected to The Royal Society as a Foreign 
Member. His ‘Habilitation’ lecture (the final requirement for his acceptance 
as a lecturer at Gottingen university) had the title ‘On the hypotheses that 
lie at the foundations of geometry’, and was delivered on 10 June 1854. It 
was the third of three titles from which the aged Gauss was to choose and 
quite the most surprising — and fortunate. Building on Gauss’s own ideas, it 
brought to the mathematical world the clear idea of the intrinsic geometry of 
space and paved the way for Einstein to formulate his theories of relativity 
and was to become a classic of mathematics, even though few (other than 
Gauss) were able to appreciate its profundity at the time. Our interest lies in 
another paper and the only one that he ever published on number theory. ‘On 
the number of prime numbers less than a given quantity’ was submitted to the 
Berlin Academy of Sciences in 1859 as evidence of his latest research and 
just as his paper on geometry revolutionized the current views of space, finally 
freeing it from the Euclidean constraints, so his paper on number theory showed 
an entirely new and incredibly fruitful direction in which to head in pursuit of 
those unpredictable primes. It was not meant to be an attack on the Prime 
Number Theorem but to provide an entirely new way of counting primes and 
therefore of approximating n(x) and did so by utilizing complex numbers and 
in particular the techniques of the new discipline of complex function theory. 
His approach was not rigorous but it scattered about the most fertile ideas as it 
rushed headlong through its eight pages and, using and refining these initiatives, 
two later mathematicians met with eventual success and finally provided the 
proof that had eluded so many for so long. 

Legendre and Gauss had raised the issue of the nature of the prime count- 
ing function and with Gauss’s involvement there is an inescapable feeling of 
deja vu. It was he who had looked into the asymptotic statistical behaviour of 
almost all continued fractions and proposed a logarithmic solution involving a 
diminishing error term. The problem was not solved by him and it took a cen- 
tury before it was solved and then by two mathematicians, independently and 
nearly simultaneously, a result that brought some sort of order into a seemingly 
chaotic world. All of this is true of the Prime Number Theorem. Building on 
Riemann’s ideas, the Belgian de la Vallee Poussin (whom we met on p. 113) 
and the Frenchman Jacques Hadamard (1865-1963) finally justified the word 
‘theorem’ being used when in 1896 they showed that the relative error term in 
the approximation of n(x) by Li (x) was asymptotically zero. With all this pro- 
found mathematics around it is amusing to note that the proofs relied in part on 
the elementary trigonometric identity 3 + 4 cos 0 + cos 29 = 2(l + cos0) 2 ^ 0! 

We will look more closely at Riemann’s initiative in the final chapter but 
whatever the detail and with all the joy of success, it seemed unnatural that 
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complex numbers were needed to prove a result about primes. Had a real- 
number proof escaped the scrutiny of the many mathematicians who had tried 
to find one? It seemed not as recently as 1932, since in that year the distinguished 
number theorist A. E. Ingham’s much respected tract The Distribution of Prime 
Numbers was published, from which we gleaned that earlier proof of the ‘Crucial 
Equivalence’ of p. 184, and in the introduction he expressed the view: 

The solution (of the Prime Number Theorem) just outlined (that 
of de la Vallee Poussin and Hadamard) may be held to be unsat- 
isfactory in that it introduces ideas very remote from the original 
problem, and it is natural to ask for a proof of the Prime Number 
Theorem not depending on the theory of functions of a complex 
variable. To this we must reply that at present no such proof is 
known. We can indeed go further and say that it seems unlikely 
that a genuinely ‘real variable’ proof will be discovered, at any 
rate so long as the theory is founded on Euler’s identity. For every 
known proof of the Prime Number Theorem is based on a certain 
property of the complex zeros of f( s ), and this conversely is a 
simple consequence of the Prime Number Theorem itself. It seems 
clear therefore that this property must be used (explicitly or implic- 
itly) in any proof based on £(s), and it is not easy to see how this 
is to be done if we take account only of real values of s. 

It was no small matter, then, that in 1949 Atle Selberg (born 1917) published 
such a proof; indeed, it led to his award of the Fields Medal, which has played 
the role of the Nobel Prize in mathematics. Since that time other real-variable 
proofs have emerged, all termed ‘elementary’ and all fantastically difficult! 

De la Vallee Poussin was particularly interested in the size of the error term 
involved in the approximations of 7t(x) and in an 1899 paper forever put to 
rest any doubts regarding primacy (!) among them. Confounding Legendre, 
and plenty of numeric evidence, he proved that 1 is asymptotically the optimal 
choice for a in the expression 

X 

tt(x) — - — b e x . 

In x — a 

(In 1962, Rosser and Schoenfeld showed that x/(lnx — 0.5) < n(x) < 
x/(lnx — 1.5) for x f 67.) In the same paper he sounded the death knell 
for such estimates of tv (x ) in that he proved, for large values of x, Li(x) is 
better than any of them. 

What has complex function theory to do with prime numbers? Just how accu- 
rate is the approximation of Li (x) to jt (x ) ? Simple enough question perhaps, 
but ones with very, very complicated answers. 
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The Riemann Initiative 


The Zeta function is probably the most challenging and mysterious object of 
modern mathematics, in spite of its utter simplicity. . . The main interest comes 
from trying to improve the Prime Number Theorem, i.e. getting better estimates 
for the distribution of the prime numbers. The secret to the success is assumed to 
lie in proving a conjecture which Riemann stated in 1859 without much fanfare, 
and whose proof has since then become the single most desirable achievement 
for a mathematician. 

M. C. Gutzwiller 


16.1 Counting Primes the Riemann Way 


In his paper Riemann considered another weighted prime counting function, 
which we will write as n(x), related to the harmonic series and defined by 

77(a) = V 

L — ' r 

P r <x, 
p prime 

which again reveals a bit more about itself if we look at a couple of examples: 


n (20) = 

p r < 20, 

p prime 
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where the bracketing is by the primes 2, 3, 5, , 19, and 


77(30)= T - 

‘ J r 


p r < 30, 
p prime 

1 

7 


where the bracketing is by the primes 2, 3, 
These can be rewritten as 


1 
I 

5,..., 29. 


77(20) = 7 + t + 7 + t + t+ t + t + t 


1 1 
T + 7 

2 V 1 


and 


77(30) = 7 + 7 + 7 


1 (l 1 

+ 2U + 7 


1 1 
T + T 


l 

T 

l -( l ~ 

4 \ 1 

1 

h 7 4 


l 

f T + 

~(- 
4 V 1 


The first bracket just counts the primes less than the number, the second those 
less than its square root, etc., to suggest in general that 


77 (x) = -7T(x l / r ), 

z — ' r 

r = l 

where, of course, there is in fact a finite number of terms. 

The next step involved another of Gauss’s students, August Mobius ( 1790- 
1868), who is most famously known for his one-sided band. He also produced 
a sophisticated ‘changing the subject of a formula’ technique known as Mobius 
Inversion to allow Riemann to arrive at the formula 


7T(x) — ^ ^ n(x l / r ), 

z — ' r 

r = 1 


where /i(r) is the Mobius function, which is somewhat esoterically defined by 
H(l) — 1 and 


iMr) = 



r has a repeated factor, 
r has an even number of prime factors, 
r has an odd number of prime factors. 
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Taken out of context, this seems strange but the move is a standard number- 
theoretic one and not nearly as bizarre as a first impression suggests. 

This is all very well, but all of this is of no use unless 77 (x) can be found 
by other means, and that other means was something of a favourite technique 
of Riemann’s, and of a growing number of other contemporaries: the use of 
complex numbers and particularly complex function theory. 

16.2 A New Mathematical Tool 

Two parts of the unique Parisian postal system are the 7th and 15th Arrondisse- 
ments, and they are connected by more than adjacency: the 7th, apart from 
anything else, is home to Gustave Eiffel’s tower, built as part of the World’s 
Fair of 1889; the 15th to the Rue Cauchy. Each commemorates in its own way 
the contribution of Augustin Louis Cauchy (1789-1857), whose name appears 
on a plaque on the first stage of the tower, along with 7 1 other prominent French 
scientists. Whether one subscribes to the view that ‘Cauchy was an admirable 
type of the true Catholic savant’ or that he was possessed of ‘self-righteous 
obstinacy and aggressive religious bigotry’, he was a great mathematician and 
comparable to Euler in the volume of his mathematical output, which was as 
varied as it was profound, but unlike the mathematically flamboyant Euler, 
Cauchy was a rigorist and his contributions to the 19th century search for a 
firm foundation for mathematics were second to none. We are interested in his 
involvement in the development of complex function theory and many famous 
names appear in the list of those who advanced this important area of mathemat- 
ics: Euler, Gauss, Riemann, d’Alembert, Laplace, Poisson, etc., but his stands 
above them all, although we will have need of only a small (but significant) 
part of the vast subject that it has become. In fact, to understand the impact of 
it on the study of prime numbers we will need three basic ideas from it: how 
to differentiate, how to integrate and the concept of analytic continuation. Dif- 
ferentiation is a very reasonable extension of the real case, with ‘differentiable’ 
equivalent to ‘analytic’. Integration is more difficult (it always is) and requires 
the concept of integrating along a curve, or ‘contour’. Analytic continuation 
is initially unbelievable. The technical details of complex differentiation and 
integration are approached in Appendix D; here we will simply put them to use, 
but first we need to define and appreciate analytic continuation. 

16.3 Analytic Continuation 

The replacement of ‘differentiable’ by ‘analytic’ is more than semantic pedantry. 
Differentiation is essentially a limiting process and for a real function the limit 
can be approached from just two directions and must be independent of the 
direction chosen (which is why f(x ) = \x\ is not differentiable at the origin). 


191 



CHAPTER 16 


60 



40 


20 


-3 


ffix) 


-2 



1 


-40 


-20 


0 


r 


2 


/ 2 W 


3 


Figure 16.1. The problem of real continuation. 


In the complex case there is an infinite number of possible directions, and 
again the answer must not depend on which of them is chosen. This makes 
great demands on the function and brings about strong results — one of which 
is analytic continuation. The process is probably best approached from the real 
case. For example, consider the functions 



The theory of geometric series tells us that the first function converges only in 
the domain |jc| < 1 and that the two functions are the same inside it. Plotting 
/i(x) to some number of terms and / 2 (x ) on the same axes emphasizes that 
fact, and the difference between the two outside the interval (see Figure 16.1). 
There is not much sense in saying that the two are the same for |x| > 1, or that 
any of the infinite number of approximations to f\ (x) will ever approach / 2 (x) 
in this region. Perhaps this all seems obvious, but replacing x e 1 by z e C 
changes everything since we have the following uniqueness theorem. 

If in some complex domain A, two analytic functions are defined 
and are equal at all points on a curve C lying inside A, they are 
equal throughout A. 

Let us pause to reflect on the enormity of what is being said. For example, 
suppose that two analytic functions are defined on the whole of C and are known 
to coincide just over the interval [0, 1] on the real axis; then they must be equal 
everywhere else. Referring back to our example, 


/i (z) = 1 + z + z 2 + z 3 H — fiiz) = — ^ — only for |z| < 1 


1 - z 
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and is defined only in that circular region. Yet f 2 (z) is defined in all of C, apart 
from z — 1, and so by the uniqueness theorem is the extension of f\ (z). It is 
like a sleight-of-hand trick. 


16.4 Riemann’s Extension of the Zeta Function 


Riemann’s approach to the continuation of the Zeta function was to use contour 
integration and we deal with the details in Appendix E, but the result is that 


£ 0 0 = 


r(i-z) 

2 ni 


, 4-1 


du 


extends the definition of Zeta to all z 1, for a particular contour u~ . We can 
see evidence of the Beautiful Relationship which we established on p. 60, with 
the real integral replaced by a particular contour integral. 


16.5 Zeta’s Functional Equation 

In a paper read in 1749 but not published until 1761, Euler suggested that the 
(real) Zeta functions satisfied the exotic functional relationship, 

f( 1 ~x) = X (*)?(*), 


where 

X(x) — 2{2ir)~ x cos(7Ta/2)E(a). 

He gave no proof but had verified the relationship to a point that, in his view, put 
the result beyond doubt. In the end the proof had to wait for Riemann and his 
complex generalization. By integrating around a second variable contour, which 
in the limit is the same as the original used to extend Zeta, the contour integral 
can be eliminated between two equations, leaving the above result, with real x 
generalized to complex z, and a form which conveniently reveals the important 
properties of the generalized Zeta function. Once again, the reader may wish 
to believe this or go to Appendix E for a proof. 


16.6 The Zeros of Zeta 

If we look at a plot of the real, extended Zeta function (Figure 16.2), we can 
examine its behaviour for x < 1. The vertical asymptote at x — 1 is clear 
enough but on this scale the behaviour along the negative real axis is obscure 
and we need to zoom in a little, and doing so suggests that the function is zero 
at every negative, even integer. 
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Figure 16.2. The real, extended Zeta function. 



Figure 16.3. Behaviour for x < 0. 


On p. 41 we saw that Euler had established Zeta’s behaviour at positive, even 
integers in that 


C(2x) = ^2 

r= 1 


1 

y2X 


n ,-t(2 n) 2x 

2(2x)\ B 


for x = 1, 2, 3, ... , 


where the Bi x are the Bernoulli Numbers, but that the form of the Zeta function 
evaluated at odd positive integers (greater than 1, of course) remains a mystery 
to this day. In fact, the extended Zeta function is a little more compliant in that 
its exact form for all negative integers is known to be 

£(-*) = (-1)*— l — -Bjt+i for x = 0, 1,2,..., 
x + 1 

which means that f (0) = — \ and, since the other odd Bernoulli Numbers are 
all zero, it must be that the extended Zeta function is zero at negative even 
integers: these are called the trivial zeros — but there are others, which are not 
nearly so trivial. 
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Figure 16.4. The symmetry of Zeta’s non-trivial zeros. 


The functional equation echoes this fact and also reveals a great deal more 
about the ‘non-trivial’ zeros of the Zeta function. If z e {—2, —4, —6, . . . }, 
cos(7rz/2) Obut r(z) and therefore x (z) is infinite, whereas £(1— z) is finite; 
the only reconciliation is that £ (z) = 0 and we have those trivial zeros again. The 
Euler product form of £ (z.) (as shown on p. 62), valid only for Re(z) > 1 , clearly 
cannot be zero. For any zeros that do exist, the functional relationship tells us that 
iff(z) = 0 and y (z) is finite, then £(1 — z) = 0. Therefore, there can be no other 
zeros for Re (z ) <0, as such a zero would necessarily spawn another with its real 
part greater than 1 . Riemann argued that there is an infinite number of these non- 
trivial zeros, which have a strong symmetry. In the interval 0 < Re(z) < l,£(z) 
is a single-valued analytic function which is real when z is real; this is enough to 
mean that (£(z))* = £ (z.*) (which is called the Schwarz Reflection Principle), 
and this means that ((:) = 0 4 (f(z))* = 0 <£> f (z*) = 0 (where z* is 
the complex conjugate of z) and the symmetry becomes fourfold. It is hardly 
obvious, but no zeros lie on the line Re(z) = 1 and using the functional equation 
none can lie on Re(z) = 0. This seemingly minor detail is what de la Vallee 
Poussin and Hadamard each established as an essential step in proving the Prime 
Number Theorem. In 1932 the eclectic, attractively eccentric American genius 
Norbert Wiener (1894-1964) showed that this result and the Prime Number 
Theorem are in fact entirely equivalent. 

All non-trivial zeros of the Riemann Zeta function lie, then, symmetrically 
in the interval 0 < Re(z) < 1, which is known as the ‘critical strip’; the shaded 
region in Figure 16.4, where £(z) — 0. 

I list the first few of these non-trivial zeros (with positive imaginary part 
providing a natural order) in Table 16.1. The most striking feature is that the 
real parts of each of the complex numbers is always 0.5: is this a representative 
selection? No one knows, but all available evidence suggests so and we will 
be addressing that critical matter soon; no one knows what those trailing dots 
suggest either — irrational, transcendental, etc.? 
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Table 16.1. Zeta's early non-trivial zeros. 


0.5 + 14. 134 725 141 734 693 790 457 25 1 983 562 470 270 784 257 1 15 699 243 .. . i 
0.5 + 21 .022 039 638 771 554 992 628 479 593 896 902 111 334 340 524 902 78 1 . . . i 
0.5 + 25.010 857 580 145 688 763 213 790 992 562 821 818 659 549 672 557 996 .. . i 
0.5 + 30.424 876 125 859 513 210 31 1 897 530 584 091 320 181 560 023 715 440 .. . i 
0.5 + 32.935 061 587 739 189 690 662 368 964 074 903 488 812 715 603 517 039 .. . i 


16.7 The Evaluation of 77(a) and tt(x) 


With the Zeta function analytically continued and with the symmetry of its zeros 
established, Riemann used contour integration again to develop a very striking 
expression for 77(a) involving a very important infinite series, 

d u 

77(a) = Li(pc) — Y'' Li (x p ) — In 2 + / » a > 1. (16.1) 

z — ' J x uiu 1 — 1) In u 


The main things to notice about the formula are that Li (a) appears together 
with a simple constant and another of those awkward integrals, which can be 
approximated to any accuracy for any given a; we also see an arresting series, 
which is summed over the infinity of zeros of the extended Zeta function. His 
argument was not fully rigorous and we will not attempt to repeat it here, but 
if we accept this mathematical alchemy for the moment, we can sum over any 
finite number of the zeros to arrive at an approximation of 77 (a 1 /'’) for any a 
and the appropriate range of r, then use the expression 


OO 

77 ( A) = ^ 
r= 1 


^/j(V") 


to approximate 7t(a); it seems a very tortuous route, but the diagrams in Fig- 
ure 16.5 suggest that it is a very fruitful one 

To make the mathematics sensible, it is necessary to define the step function 
7t (a) at the vertical step at each prime to be the midpoint of the rise; with this 
we can see that this process is able to take into account the local fluctuations 
in the behaviour of 7t(a). In fact, if we look more closely at the contribution 
made by each of the Zeta function’s non-trivial zeros we see that the kth zero 
contributes 

Li(x Pk/r ) + Li(x p Z /r ) 

to the sum and therefore 

7* (a) = V x pk/r ) + Li( x p t /r )) 

r 

r— I 

to 7T (a) . Some of the first few of these component functions are shown in Fig- 
ure 16.6; notice the vertical scales — the early zeros contribute more significantly 
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Figure 16.5. The Riemann approximation of the prime step function 
with (a) 10 terms and (b) 200 terms. 

than those further on. The whole process is reminiscent of Fourier analysis and 
indeed the connection is profound: we are looking at the ‘music of the primes’. 

16.8 Misleading Evidence 

Looking back at Figure 15.8 on p. 176 shows that, at least up to 10 7 , Li(x ) > 
7 r(x). This continues to be the case far, far beyond this value, in fact even today 
all available numeric evidence continues to point to Li (x) being an overestimate 
of tt(x). Gauss always thought it to be true, so did Riemann, who at the end of 
his paper wrote, 

Indeed, in the comparison of Li (x) with the number of prime num- 
bers less than x, undertaken by Gauss and Goldschmidt and carried 
through up to x equals three million, this number has shown itself 
out to be, in the first hundred thousand, always less than Li (x ) ; in 
fact the difference grows, with many fluctuations, gradually with x . 

Li (x) seemed too big and Riemann suggested that it is in fact a closer approx- 
imation to a weighted sum of the tt(x) than it is to n (x ) alone; explicitly, that 
in his expression 



r= 1 

the n(x) might reasonably be replaced by Liix) itself to give 


Li(x) sa 7 r(x) + jTrCx 1 / 2 ) + j 7 r(. x 1 ^ 3 ) + ■ ■ • 


and by Mobius Inversion 


n(x) & Li(x) — \Li(x 1 ! 2 ) — ^Li(x x ^) — • • • , 
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an expression with dominant term Li(x), but including an infinite series of 
refinements. And so we have a final approximating function 

R{x ) = - Liix 1 ^). 

' r 

r = 1 

Figure 16.7 shows plots of this final approximation R(x) with j r(x) and the 
difference between them fosters the hope that we do have an improvement for 
all x and for this to be true we clearly need that Li (x) > it (x ) ; unfortunately, 
it is not always so. 

The leading quotation at the beginning of Chapter 7 was from the pen of God- 
frey Harold Hardy, a complicated, modest, deeply gifted and influential number 
theorist, whom we have mentioned several times already. He is remembered for 
his own significant and individual contributions to mathematics but also those 
brought about by his collaboration with his great contemporary, John Edensor 
Littlewood (1885-1977). An incisive and elegant thumbnail picture of Little- 
wood appeared in a 1971/2 issue of the magazine, Mathematical Spectrum : 
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Figure 16.7. The Riemann estimate. 


Fellow and Copley medallist of The Royal Society, honorary doctor 
or member of many universities and academies, is the outstanding 
mathematical analyst of his generation. Born in 1885, he has been 
a Fellow of Trinity College Cambridge since 1908 and Rouse Ball 
Professor of Mathematics from 1928 to 1950. Littlewood’s papers 
in analysis and Number Theory, of which over 100 were written 
in collaboration with the late G. H. Flardy, have a striking power 
which to mere mortals seems nothing short of miraculous. 

Flardy agreed with *. . . knew of no one else who could command such a com- 
bination of insight, technique and power. . . ’ . 

More romantically, Hardy and Littlewood are forever linked with the name 
of the Indian genius Srinivasa Ramanujan (1887-1920), with the story of this 
remarkable association told in The Man Who Knew Infinity. An example of 
a typically extraordinary result of Ramanujan’s is an exact formula for the 
derivative of tt(x), with which we argued intuitively earlier. He proved that 



r = 1 


where the derivative of the step function is defined in terms of the usual limit. 

All three took a profound interest in Number Theory in general and the Prime 
Number Theorem in particular and it was Littlewood who, in 1914, proved that 
eventually jt(x) will overtake L i (x ) and more, that the two functions will swap 
in magnitude infinitely often from that point. Of course, this means that at these 
values, R(x) will not be the accurate approximation we would expect it to be. In 
The Distribution of Prime Numbers , Ingham commented, ‘This function ( R (x)) 
approximates it (a) with astonishing accuracy for all values of x for which ir(x) 
has been calculated’. But he continues by remarking that, with Littlewood’s 
result, ‘its superiority over the function Li(x) is illusory’ and that ‘for special 
values of x (as large as we please) the one approximation ( Li(x )) will deviate 
as widely as the other (R(x)) from the true value. On the bright side, he also 
admits that ‘on average’ the first part of R(x), Li(x) — ^Li(x ] / 2 ), will be a 
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better approximation to tc(x) than Li(x) alone — at least, if something called 
the Riemann Hypothesis is true. The obvious question to ask is, what is the 
smallest value of x at which n{x) > Li (x)l To that question there remains no 
definite answer. In that same paper, Littlewood also proved that the asymptotic 
oscillations of the difference between the two functions are of the order of at 
least Li (~Jx) In In In x, but he gave no explicit estimate of the whereabouts of 
that first sign change. Later, his student Stanley Skewes showed that it occurred 
before 

i n 10 34 
10 10 , 

which has become known as the ‘Skewes Number’, and which at the time was 
the biggest ‘useful’ number ever defined (‘Graham’s Number’, from the world 
of combinatorics, now dwarfs it). As of 2000, Carter Bays and Richard Hudson 
have improved the bound by showing that the first change of sign occurs before 
a mere 1.398 22 x 10 316 , a number still far beyond present-day computational 
reach. 


16.9 The Von Mangoldt Explicit Formula— and How It Is Used to 
Prove the Prime Number Theorem 

It was left to others to recast Riemann’s thoughts with the severity that math- 
ematics ultimately demands and in this case the most notable contributor was 
Von Mangoldt, who provided a rigorous proof of Riemann’s Equation (16.1) but 
who also established a similar expression for the xj/ function described on p. 183, 
and which has overtaken /7(a ) in the study of the Prime Number Theorem. It 
is this form that we will look at in some detail. 

We have that the complex form of the Euler identity is 

n i- p -z’ 

p prime 


and is valid for Re(z) > 1, and so 

In f (z) = In Y\ l _\- z = - ln (! - P ~ Z ) 

p prime ^ p prime 

= - J2 Ml~e~ zlnp ). 

p prime 

Differentiating with respect to z then gives 

C'(z) e~ zXnp \np p~ z \np y^ In p 

t(z) ~ ^ 1 - e ~ zln P ~ ^ 1 - p~ z ~ *-• -p^' 

p prime p prime p prime 

r=l 
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The last equality uses the sum of an infinite geometric series. We will use the 
i jr(x) ~ x form of the Prime Number Theorem and recalling the definition 
i fr(x) — E^ln p, we will naturally seek to extract the logarithmic part 
from the sum on the right-hand side of Equation (16.2), which can be done 
using the contour integral device 

0 < y < 1, 

y = i. 
y > i. 


~f 

2 ni Jc 


c—ioo z 

— d z = 


where c is a convenient real number. Once again for those who are aware of 
them, the techniques of Fourier analysis are familiar. 

Multiplying both sides of the expression (16.2) by x z /z and rearranging gives 


x z In p / x V In P 

Z . p rz “ \P r ) Z 

p prime p prime N / 

r= 1 r= 1 


g '(z) 
£(z) z' 


and so integrating both sides along the contour gives 


1 

2n i 


£ 


p prime 
r= 1 




E 


In p 


1 

2 ni 


p prime 
r = 1 



1 

2jri 

1 

2jti 


r 

r 

Jc—ic 


f ; (z) ^ 
C(z) z 


d z. 


? ; (z) ^ 
?(z) z 


dz. 


Now take y = x/ p r to get 


E 

p prime 
r— I 


In 


1 

27t; 


Jc—ic 



1 

27T! 


J c—ic 


S\z) x z 
C(z) z 


dz 


and 


OO 

l/r(x) = ^ In p 

p r <x 


1 

27r/ 


r 


?(z) X Z 
?(z) z 


dz 


since > a would mean y < 1 and the integral contribution 0; a must not 
be the power of a prime. The remaining contour integral is evaluated using the 
theory of residues, all of which have to be added together to arrive at the answer. 
The integral is best thought of divided into four different categories of residue, 
as in Table 16.2. 
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Table 16.2. The four types of residue. 


Singularity 

Cause 

Residue 

0 

x z 

?,(0) =ln2. 

1 

z 

Pole of f 

H 

1 

1! 

1 

oo" 

1 

SO" 

1 

1 

<N 

1 

Trivial zeros of f 

1 v — 2 1 —4 1 —6 1 v — 8 

jX , 4 A , gA , gA , . . . 

p 

Non-trivial zeros of f 

x p 


p 


Yet again the Taylor series for In appears, this time as 

\x~ 2 + \x~ 4 + g.r -6 + jx -8 H — \ ln(l - x~ 2 ) 

and we therefore have 

ijr(x) — x — ln(27r) — i ln(l — x~ 2 ) — — , 

f(p)= o y 

where the sum is over the non- trivial zeros, which is the equivalent of Riemann’s 
expression for TJ(x). It is known as the Von Mangoldt explicit formula and has 
to be the most important in the whole of analytic number theory. At first it 
looks contradictory to have a real function on the left in part made up from 
an infinite sum of complex numbers, but the roots do occur in conjugate pairs, 
which makes the terms, taken in such pairs, real. 

Now we can see the connection between the Prime Number Theorem and £ ’s 
zeros. If we write p = u + iv, then \x p \ = x u and u < 1 would mean that, as 
x -* oo, each error term in the series is of order less than x and this would mean 
(with a bit more mathematical rigour) that i )r{x)/x — > 1, as required. That is, 
the real part of the non-trivial zeros of the extended Zeta function being less 
than 1 would imply the Prime Number Theorem and, as we have said, it was 
this fact that de la Vallee Poussin and Hadamard independently established. 

16.10 The Riemann Hypothesis 

In his paper, Riemann defined a function £ , related to f. by 

$(«0 = x~ z/2 (z - + l)?(z), 

where z = \ + iw. Why? Really, because it is easier to handle than £(z). The 
(z — 1) eliminates the problem with £(z) at z = 1 (recall from p. 41 that 
(z — l)£(z) — y 1 as z 1) and so § is analytic in the whole complex plane, it 
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Figure 16.8. The location of f ’s early zeros. 


is also not hard to check that %(z) = f (1 — z) and from the definition it’s clear 
that the set of zeros of £ is the same as the set of zeros of £ . What is more, the 
fact that all of the non-trivial zeros of £ lie in 0 < Re(z) < 1 means that if we 
write ^(w) = %(u + iv) — 0, then£(z) = 0, where z = — v) + iu and so we 

must have that 0 < 4 ~ v < 1 and so — j < v < j ; that is, the zeros of £ must 
have imaginary parts lying between — \ and J? . Using the symmetry of the zeros 
of f we need only consider those which have a positive imaginary part, making 
u > 0 and therefore Re(ui) > 0. This results in the region in Figure 16.8. 

Riemann argued (again vaguely) that about 

T T T 

— In 

2tt 2i x 2 n 

of the zeros lie in such a rectangle and as a test he calculated the real zeros, to find 
that the number closely agreed with the counting function, which left little space 
for any others. In his own words, ‘One now finds indeed approximately this 
number of real roots within these limits, and it is very probable that all roots are 
real’ . If this is the case, the real part of the zeros of £ must be J, . Continuing, he 
remarked, ‘Certainly one would wish for a stricter proof here; I have meanwhile 
temporarily put aside the search for this after some fleeting futile attempts, as it 
appears unnecessary for the next objective of my investigation.’ Which brings 
us to the vaunted 


Riemann Hypothesis 

The non-trivial zeros of the Riemann Zeta function 
all have real part one-half 
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Figure 16.9. The behaviour of Zeta on and near the critical line. 

(a) z — \ + xi\ (b) z = j + xi. 

In terms of Figure 16.4 on p. 195, the zeros all lie on the line of symmetry rather 

than in any other part of the critical region. 

The two plots in Figure 16.9 show the early behaviour of the function |£(z)| 

for points on the critical line z = \+ xi and for points on the parallel line 

z = 1 + xi. They also show that £(z) has plenty of zeros at the start of the 
1 | 

vertical line Re(z) = | but none such on Re(z) = although it can come 
perilously close, as can be seen near the point ^ + 14/ . 

Plotting l/|f(z)| in Figure 16.10 gives another revealing glimpse of the non- 
trivial zeros, which appear as spikes along the line Re(z) = j. The trivial zeros 
bring about the ‘mountain’ on the left. 

In passing, Hadamard established the very satisfying form 

%(w) = -e~ Az ]“[ (l--\ z/p , 

?Go)=0' P ' 

where A — — — l+^ln4jr. 

16. 1 1 Why Is the Riemann Hypothesis Important? 

The Riemann Hypothesis states that all non-trivial roots of the Zeta function 
have real part \ , a far stronger condition than the one required for the proof of the 
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1 / m\ 



Figure 16.10. A three-dimensional view of Zeta’s early zeros. 



Prime Number Theorem, which merely requires that none have real part 1 . The 
immediate importance of the conjecture is in the measurement of the size of the 
error involved in the approximation of i r(x) by Li (x), but it strikes much deeper 
and into the greatest depths of mathematics, with the error involved in many 
important asymptotic formulae also governed by it: for example, the weaker 
form of the Goldbach Conjecture, which states that every odd number is the sum 
of three primes, is implied by it. Fields Medallist, Enrico Bombieri, has said that 
‘The failure of the Riemann Hypothesis would create havoc in the distribution 
of prime numbers’. Since the Riemann Hypothesis is involved with the size of 
the error in approximating i/r (x) by x, it therefore is involved with the error in 
approximating jr(x) by Li (x). To be exact, in 1901 von Koch proved that, if the 
Riemann Hypothesis is true, the known estimate jt(x) — Li (x) + 0(xe^ c ^ h ' x ) 
would become tc(x) — Li(x) + 0(^/x lnx), which Bombieri has commented 
would be hard to significantly improve on, given Littlewood’s result that the 
degree of oscillation of jt(x) — Li (x) is asymptotically of the order Li (*/x) x 
In In In x. Figure 16.11 gives some sort of idea of the difference between the 
size of the errors with and without the Riemann Hypothesis. 

Proving the Riemann Hypothesis has subsequently become the greatest prob- 
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lem in mathematics but has largely resisted attempts by some of the best math- 
ematicians of the 20th century to gain significant headway with it. 

16.12 RealAlternatives 

The uniqueness theorem allows us the freedom to extend Zeta’s definition in 
any way we please and various methods have been used to do just that, includ- 
ing the use of Euler-Maclaurin summation; of course, Riemann used contour 
integration, which reveals a great deal about the nature of the extended func- 
tion. Another approach uses the generalized alternating harmonic series, the 
‘alternating Zeta function’, defined by 


%a(z) — y ] 


(-i y 


r=i 

which converges in the bigger region Re(z) > 0. 

We can write this as 
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defined for Re(z) > 0. 

The extension is made complete using yet another technique of Euler’s, 
‘Euler’s series transformation’, and this results in 


1 00 i r 

~ i _ 2 l ~ z ^ ?'■+! ^ ( 


?r+l 
r = 0 k = 0 


1) A 
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for z # 1 • 

It seems light years away from the contour integral form, but remember that 
uniqueness theorem for analytic extension! We can use the extension (16.3) to 
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give a tantalizingly simple reformulation of the Riemann Hypothesis without 
complex numbers appearing at all. Using standard methods, 

r z = r a+ib = r a r ib = rV' Mnr = r a (cos(Mnr) + i sin(Mnr)) 

and so 

1 1 

— = — (cos(Mn/-) — i sin(ftlnr)), 

yZ y& 

which means that 

r= 1 

^ (~l) r 

4 ^ — - — (cos(Mnr) — i sin(Mnr)) = 0. 

r = 1 ' 

Equating real and imaginary parts brings us to the very tempting reformulation: 



The reader may wish to check this using the early zeros given in Table 16.1 
on p. 196. It seems extraordinary that the most famous unsolved problem in 
the whole of mathematics can be phrased so that it involves the simplest of 
mathematical ideas: summation, trigonometry, logarithms and of course, if the 
conjecture is true, Christof Rudolff’s sign. It all sounds so easy to become 
the most famous name in the mathematical world! 

There are other, equivalent real formulations of the Riemann Hypothesis. For 
example, asymptotically the exact values of the integers \ Li (x )J and tt(x) must 
agree on ‘about’ half of their digits. Also, with er(n) the sum of the divisors of 
/?, that cr(n) < e y n In In n for n ^ 5041 or that a(n) ^ H n + e H " In H n for 
ft ^ 1, with equality only for n — 1 . We will content ourselves with a detailed 
look at one more celebrated reformulation. 

16. 13 A Back Route to Immortality— Partly Closed 

Any integer can be written as the product of a square and a square-free compo- 
nent and in Chapter 3 we saw this simple fact put to significant use by Erdos. Of 
course, any particular integer might be factored as a combination of square and 
square-free, for example, 2 3 x 3 5 x 7 x 1 1 2 = (2 x 3 2 x 1 1 ) 2 (2 x 3 x 7), or it 
could be a perfect square, 3 6 x5 4 x 13 2 = (3 3 x5 2 x 13) 2 , or it could be entirely 
square free, with the primes appearing just once, for example, 2x5x13x17. 
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The Mobius function /r, mentioned earlier, is used to discriminate between the 
types of factorization that are possible. Recall its definition: 


P 0 ) = 



r has a repeated factor, 
r has an even number of prime factors, 
r has an odd number of prime factors. 


Now suppose that we consider all square-free integers. It is reasonable to sup- 
pose that the Almighty has divided them pretty equally so that /i will take its 
values of + 1 and — 1 equally often (in fact, it can be shown that P(/i{r) = 1) = 
P(fi(r) — —1) = 3/7 r 2 , and therefore P(/r(r) = 0) = 1 — 6/;r 2 , giving a 
final appearance of that ubiquitous number). Having said this, we would expect 
some fluctuation in the count as we move along the list of integers — just as 
we have expected fluctuations in the accuracy of Li{x) approximating tc(x) 
or any other asymptotic approximation. But how big would we expect those 
fluctuations to be? The size of them is measured by the absolute value of the 
Mertens function M(x ) = ^,.< A /x(r), shown in Figure 16.12. 

It is clearly erratic but even so, in 1885 Thomas Stieltjes (1856-1894), ‘the 
father of the analytic theory of continued fractions’, claimed in a letter to his 
frequent correspondent Charles Hermite (1822-1901) that M(x)x -1 / 2 stays 
within two fixed bounds, no matter how large x may be; he added (in parenthesis) 
that the bounds could probably be taken to be + 1 and — 1 . In saying this, he was 
suggesting that \M(x ) | < «Jx. In 1897, Mertens published a paper containing a 
table 50 pages long giving values of /j. ( r) and M(r) for r up to 10 000 and based 
on this evidence claimed that Stieltjes stronger estimate was ‘very probable’ and 
so \M(x)\ < x > 1, passed into mathematics as the ‘Mertens Conjecture’. 
In a series of papers over the turn of the century, von Sterneck published values 
of M(r) for r up to 1 000 000 and on that evidence conjectured the stronger 
|M(x)| < 0.5^/x,x > 200. 

Stieltjes’s proof never appeared because the assertion is wrong, which means 
that the von Sterneck assertion is wrong too, and even the weaker forms, with 
larger bounds, might be doomed to failure also. It took until 1 963 to disprove the 
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stronger form, when Gerhard Neubauer found that with x = 7 725 038 629, the 
0.5 but not the 1 boundary is broken. Not until 1985 was the original conjecture 
dispatched, when A. M. Odlyzko and H. J. J. te Riele proved that eventually 
the positive and the negative barriers are broken. (With that erratic behaviour, 
it is hardly surprising that they formulated their result in terms of the ideas 
on p. 113; to be exact, they showed that limsup,.^^ M(x)x ~ 1 ' 2 > 1.06 and 
liminf Y _ >0 o M (x)x~ 1 2 < —1.009.) Their proof was one of existence and as 
such provided no estimate, let alone value, for such an x; in the same year Janos 
Pintz proved that the first counterexample must be less than 3.21 x 10 64 — big, 
but bear in mind the Skewes and Graham numbers ! 

This all seems a shame, with numeric evidence once again leading intuition 
astray; a few million, a few billion, a few trillion. . . do not mean much here; in 
number theory, big really can mean BIG! 

What has it to do with the Riemann Hypothesis? Its truth would have implied 
it. In fact, the truth of \M(x)\ < C^fx for any constant C would imply it — and 
that remains an open question; small wonder that the conjecture has attracted 
the attention that has led to two of its forms being disproved. 

The Zeta function is intimately related to the Mobius function in that 
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for Re(z) > 1. 


We will not prove this fact, but it is another standard result of number theory. 
With one last look at complex function theory and with this result at our disposal, 
we can see that tantalizing connection, given that we define M( 0) = 0: 
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since M(x) is constant on each interval [r. r + 1). 
If the Mertens conjecture is true, then 
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Figure 16.13. Early evidence for Stieltjes conjecture. 


The last integral would converge provided that Re(z) + 4 > 1 , which means 

I ^ 1 

that Re(z) > 4 - If this is so, it would define a function analytic in Re(z) > j, 
which would give an analytic continuation of l/£(z) from Re(z) > 1 in the 
original formula to Re(z) > 4 (that sleight of hand again). This would mean 
that 1 /£(z) is defined for Re(z) > I (and therefore that £(z) can have no zeros 
there); by symmetry, none could exist in Re(z) < 4- so they all must lie on 
Re(z) = 4 and that of course is the Riemann Hypothesis! 

16.14 Incentives, Old and New 

Mathematical Problems 

Lecture delivered before the International Congress of 
Mathematicians at Paris in 1 900 

By Professor David Hilbert 

Who of us would not be glad to lift the veil behind which the future 
lies hidden; to cast a glance at the next advances of our science and 
at the secrets of its development during future centuries? What 
particular goals will there be toward which the leading mathemat- 
ical spirits of coming generations will strive? What new methods 
and new facts in the wide and rich field of mathematical thought 
will the new centuries disclose? History teaches the continuity of 
the development of science. We know that every age has its own 
problems, which the following age either solves or casts aside as 
profitless and replaces by new ones. If we would obtain an idea 
of the probable development of mathematical knowledge in the 
immediate future, we must let the unsettled questions pass before 
our minds and look over the problems which the science of today 
sets and whose solution we expect from the future. To such a review 
of problems the present day, lying at the meeting of the centuries. 
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seems to me well adapted. For the close of a great epoch not only 
invites us to look back into the past but also directs our thoughts to 
the unknown future. The deep significance of certain problems for 
the advance of mathematical science in general and the important 
role which they play in the work of the individual investigator are 
not to be denied. As long as a branch of science offers an abundance 
of problems, so long is it alive; a lack of problems foreshadows 
extinction or the cessation of independent development. Just as 
every human undertaking pursues certain objects, so also mathe- 
matical research requires its problems. It is by the solution of prob- 
lems that the investigator tests the temper of his steel; he finds new 
methods and new outlooks, and gains a wider and freer horizon. 

It is difficult and often impossible to judge the value of a problem 
correctly in advance; for the final award depends upon the gain 
which science obtains from the problem. Nevertheless we can ask 
whether there are general criteria which mark a good mathemati- 
cal problem. An old French mathematician said: ‘A mathematical 
theory is not to be considered complete until you have made it so 
clear that you can explain it to the first man whom you meet on the 
street.’ This clearness and ease of comprehension, here insisted 
on for a mathematical theory, I should still more demand for a 
mathematical problem if it is to be perfect; for what is clear and 
easily comprehended attracts, the complicated repels us. Moreover 
a mathematical problem should be difficult in order to entice us, yet 
not completely inaccessible, lest it mock at our efforts. It should be 
to us a guide post on the mazy paths to hidden truths, and ultimately 
a reminder of our pleasure in the successful solution. 

On 8 August 1900 David Hilbert (1862-1943) rose to a lecturn in the Sor- 
bonne to give what is probably the most famous lecture ever delivered by a 
mathematician (although Andrew Wiles’s series of lectures, in which he estab- 
lished a form of the Tanayama-Shimura conjecture and in particular Fermat’s 
Last Theorem — admittedly with a later corrected error — might vie for equal 
renown). Hilbert, even with the formidable competition of the likes of Felix 
Klein and Henri Poincare, was the most acclaimed mathematician of his day, 
describedby oneofhis students (a future Nobel Laureate) by the words, . .lives 
in my memory as perhaps the greatest genius I ever laid eyes on.’ He had been 
invited to give one of the major addresses at the second International Congress 
of Mathematicians and he chose to use the opportunity to chart a course for 
20th-century mathematics, in part by posing a series of 23 problems, the inves- 
tigation or solution of which would in his view lead the way to mathematical 
progress. The address opened with the lines above and continued by focusing 
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on 10 of the problems; there was no apparent order to his list but on it, and one 
discussed in the address, was problem number eight. 

8. Problems of prime numbers 

Essential progress in the theory of the distribution of prime num- 
bers has lately been made by Hadamard, de la Vallee Poussin, 

Von Mangoldt and others. For the complete solution, however, of 
the problems set us by Riemann’s paper ‘Ueber die Anzahl der 
Primzahlen unter einer gegebenen Grosse’, it still remains to prove 
the correctness of an exceedingly important statement of Riemann, 
viz., that the zero points of the function ( (s) defined by the series 

1 1 1 

£ 0 )= l + + y + ^ + "' 

all have the real part 1/2, except the well-known negative integral 
real zeros. As soon as this proof has been successfully established, 
the next problem would consist in testing more exactly Riemann’s 
infinite series for the number of primes below a given number and, 
especially, to decide whether the difference between the number 
of primes below a number x and the integral logarithm of x does in 
fact become infinite of an order not greater than j In x . Further, we 
should determine whether the occasional condensation of prime 
numbers which has been noticed in counting primes is really due 
to those terms of Riemann’s formula which depend upon the first 
complex zeros of the function t;(s). 

Hilbert’s gigantic standing gave huge impetus in the mathematical world to 
address the problems in the list — a reputation could be made by success in any 
of them. Those who did meet with success, or who contributed significantly 
to success were to become known as members of the ‘honours class’ of math- 
ematicians. Of the 23 problems, 8 were of a purely investigative nature and 
12 of the remaining 15 have been completely resolved. Only problem number 
8 preserves its mystery almost completely and a century later it remains, in a 
practical sense, untouched. 

In 1998 the Fields Medallist, Steven Smale, put forward his own list of 18 
problems in the same spirit as Hilbert and on 13 February 2002 the solution of 
the 14th on the list was published by W. Tucker. So far this is the only one of 
them to be solved, and number one on the list is the Riemann Hypothesis. 

With the dawn of another millennium, a new incentive has been provided 
by the Clay Mathematics Institute in that they have offered one million dollars 
each for the solution of seven open questions, one of which is the Riemann 
Hypothesis. 
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16.15 Progress 

There has, of course, been progress. In 1914 Hardy wrote the paper ‘Sur les 
zeros de la fonction £ (z) de Riemann’ in which he showed that an infinite num- 
ber of the non-trivial zeros lie on the critical line Re(z) = \ . In 1921, he and 
Littlewood together proved the far stronger result that, for some positive con- 
stant A, j + iy) has at least AY zeros in each interval —Y^y^Y. Selberg, 
in 1942, improved Hardy’s original result to show that a positive proportion of 
all the non-trivial zeros lie on the critical line. (This is a subtle but important 
distinction. For example, Z is infinite but the precise ‘measure’ of its size com- 
pared with R is 0.) Conrey improved this in 1989, showing that at least 40% 
of the zeros lie on the line. The width of the critical region has been squeezed, 
but not to zero, which you may think is pretty convincing evidence, but recall 
the two conjectures mentioned earlier; Littlewood was far from convinced: he 
conjectured that the Riemann Hypothesis is false! 

Since there is known to be an infinite number of non-trivial zeros with no 
discernible pattern to them, enumerating them is not an option — other than to 
hope to find one not on the critical line. To this end, in 1903 J.-P. Gram used 
Euler-Maclaurin summation to prove that the conjecture is true for a height 
of 50, that is, for lm(z) < 50, but Euler-Maclaurin summation has long been 
superseded by a clever technique on which we will touch lightly. 

Recall that f (z) = £ ( 1 — z) and also that the function is analytic and real for 
real z- This means that we can use the Schwarz reflection formula again and, in 
particular, we have 

(Hz + it))* = £((| + it)*) = Hz - it) = $(1 - a + it)) = Hz + it) 

and the only complex numbers equal to their own conjugates are real. We have 
that § is real on the critical line and so to look for a zero on the line is to look 
for a change in sign of the § function. (The precise method for achieving this 
is technical and uses something known as Gram’s Law.) Now all we need to 
do is to provide an accurate count of how many zeros exist up to a certain 
height and compare that number with the count of the number of zeros on the 
critical line: any discrepancy proves the hypothesis false. And this takes us 
to our final genius. Recall that Ada Lovelace thought an appropriate task for 
Babbage’s Calculating Engine was the evaluation of the Bernoulli Numbers; 
the eccentric and pitifully treated British genius Alan Turing (1912-1954) felt 
that locating zeros of the Riemann Zeta function was an appropriate task for 
the Calculating Engine’s successor — the electronic computer — the intellectual 
form of which he conceived. Turing is most generally remembered for his 
immense contributions to the breaking of the German military Enigma Code 
at Bletchley Park, England, in World War II; the gripping story of ‘Ultra’ has 
been told by many now that it is not shrouded by the Official Secrets Act, the 
intellectual ‘cream of the cream’ acting in unison to achieve what was thought 
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Figure 16.14. Proof without words: the Riemann Hypothesis. 

to be impossible. Even in that most rarefied atmosphere Turing, the ‘Prof’, 
was special; his own story has been told in Andrew Hodges’s Alan Turing; the 
Enigma and Jon Agar’s Turing and the Universal Machine (among others), and 
we will merely give fleeting mention of one of his many brilliant ideas. 

In 1948 he was at Manchester university, belatedly joining the team who 
constructed the first electronic, stored-program computer and it was from here 
that he put forward his seminal ideas on machine intelligence. By 1951 the 
machine had graduated to the ‘Blue Pig’ or MUC, the Manchester University 
Computer, a massive collection of wiring and valves (concealed in metal cup- 
boards) which was set to many tasks from singing and producing doggerel to 
testing for the zeros of £ (z). At night, when it had no other work, Turing would 
set it to work widening the search and using a formula devised by him (and still 
used) to provide an accurate count of the number of zeros up to a given height. 
The search was futile and the evidence continues to build far beyond the reach 
of the Blue Pig that the two counts match; now it is known that 59 974 3 10 000 
zeros lie on the line — and of course none have been found off it! 

We have mentioned G. H. Hardy several times before and he was one of the 
outstanding mathematicians of his time, making many significant contributions 
to number theory. In his immensely impressive mathematical trophy cabinet 
there was a vast gap waiting to be filled by a proof of the Riemann Hypothesis, 
a gap that remained empty, of course, and we can gain some small insight into 
the man and his view of the Riemann Hypothesis with these three anecdotes. 

• A list of his four most ardent desires (in order) was 

(1) to prove the Riemann Hypothesis; 

(2) to score a century at Lords in a test match; 

(3) to prove the non-existence of God; 

(4) to assassinate Benito Mussolini. 

The list could vary slightly, but at its top was always the Riemann Hypoth- 
esis. 
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• On each of his regular visits to his Danish mathematical friend Har- 
ald Bohr (younger brother of Niels and the one mentioned before on 
p. 56), the unswerving routine was to arrive and sit down to construct an 
agenda for the visit; the first point on it was always 'prove the Riemann 
Hypothesis’ . 

• On the return from one such visit, facing a stormy sea passage, he scrib- 
bled a postcard and posted it to Littlewood, which read, ‘Have proved the 
Riemann Hypothesis’ ; Hardy, the atheist, reasoned that if God did exist, 
He would not allow him to die with the unjustified super-reputation that 
would have resulted in him proving this most sought after of results. 
Hardy arrived safely in England before the postcard arrived. 

When he was asked which mathematical problem was the most important, 
Hilbert answered, 'The problem of the zeros of the Zeta function, not only 
in mathematics, but absolutely most important!’. Alternatively, one could take 
M. Kline’s view, when he said in an interview for ‘Mathematical People’ in 
1985: 


If I could come back after five hundred years and find that the Rie- 
mann Hypothesis or Fermat’s last ‘theorem’ was proved, I would be 
disappointed, because I would be pretty sure, in view of the history 
of attempts to prove these conjectures, that an enormous amount 
of time had been spent on proving theorems that are unimportant 
to the life of man. 

With Andrew Wiles’s contribution to Fermat’s Last Theorem, he must already 
be unhappy and there are any number of current professional and amateur 
mathematicians who would like to make him unhappier still! 

Mathematicians do not like producing ‘conditional’ proofs and if they do so 
it shows the considerable esteem in which an unproven result is held; with this 
said, there are many, many results that begin: 'Assuming the truth of the Riemann 
Hypothesis. . . ’ . An observation by Freeman Dyson has brought about important 
connections with quantum theory; who knows, the greatest problem of abstract 
pure mathematics might be solved by a physicist — and perhaps experimentally? 
Certainly, fame (and now fortune) await the solver; as the advertising slogan of 
the British National Lottery would have it, ‘It could be you’, although Jonathan 
P. Dowling’s poem (overleaf) may serve as a cautionary warning. 
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The Riemann Conjecture 

Mein lieber Herr Riemann 
All night I will dream on, 

’bout how you deserve a lecture. 

But of course I allude 
To your famous and shrewd 
Outstanding and unsolved conjecture. 

Oh, I owe you my life, 

My 3 kids and my wife. 

For the proof of the Prime Number Theorem. 
Your Zeta function trick 
Made the proof really slick, 

And those primes — no more do I fear ’em. 

But I just stop to think. 

How I’ve taken to drink, 

And evolved this hysterical laugh- 
Because still I don’t know 
If tj ’s roots will all go 
On the line real z is a half! 

So I don’t sleep at night, 

And I’m losing my sight 

In search of this darn thing’s solution. 

As my mind starts to go 

My calculations grow 

In a flood of ‘complex’ confusion. 

I bought a computer; 

Not any astuter. 

It ran for nearly 10 years — no jive! 

But still it doesn 't know 
If Zeta’s roots all go 
On that line real z is .5 

Now I sit in my room — 

I feel doomed in the gloom — 

And entombed by mountains of paper. 

Still, I pray that some night 
My ‘oT lightbulb’ will light 
With the clue that could wrap up this paper. 
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The Greek Alphabet 
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APPENDIX B 


Big Oh Notation 


Introduced in 1894 by one Paul Bachmann, later embraced by number theorists 
in general and later still by computer scientists to measure the complexity of 
algorithms, this notation exposes the size of an expression while suppressing 
unnecessary detail. 

For example, 2 n 2 + In + 6 — > oo as « — > oo but not really any more quickly 
that n 2 itself, since as n becomes bigger the In + 6 term becomes increasingly 
less relevant and could be any other linear expression in n ; put another way, 
(2 n 2 + In + 6 )/n 2 — » 2 as n -» oo. If the 2 has no relevance, other than it 
being a constant, we write 2 n 2 + In + 6 = 0(n 2 ) and in general for positive 
functions, g{n) = O (/(«)) if g(n) is asymptotically no bigger than a constant 
times /(«); that is, f(n) is the dominant asymptotic term of g(n). 

This means that 0(1) represents a constant and, for example, In n + In In n — 
0(ln n). 

The use of the O for ‘order’ brings about the appropriate name of ‘big oh' 
notation. 
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APPENDIX C 


Taylor Expansions 


The simplest functions are polynomials, since they are generally very suscep- 
tible to standard mathematical processes. If a function is not a polynomial, 
we can look for the best polynomial approximation to it, at least over some 
interval, but we must expect global difficulties; for example, the function may 
have a vertical asymptote or be periodic or possess any other non-polynomial 
behaviour. We naturally proceed by the degree of the polynomial, that is, the 
highest power of x. 

C.l Degree 1 

It is intuitively clear that the best straight line that approximates a given curve 
at a given point is the tangent to the curve at that point (see Figure C. 1 ). If P is 
the point (a, /(a)), the gradient of the curve at P is f' (a) and the equation of 
the tangent is y — f{a) = f'(a)(x — a) and so we have the approximation 

fix) « f(a) + (x - a) f\a), 

which is Taylor’s first approximation. 

C.2 Degree 2 

Above we have simply used our intuition as to what the best approximating 
straight line would be, but we could have been more rigorous. The general 
straight line has two independent parameters, which together uniquely specify 
it: in its standard form y = mx+c they are m and c. Two independent parameters 
means that we can impose two independent conditions on the line if we are 
to judge it to be the best one to achieve our approximation, and what better 
conditions than that the line passes through P and has the same gradient as 
fix) at P? In other words, the line is indeed the tangent to the curve at P. 
With a degree 2 approximation, we are approximating the function near P by a 
parabola, which in its general form y = Ax 2 + Bx + C has three independent 
parameters A, B, C. It is perfectly natural to impose the same two conditions 
as before, but what of the third? 
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If we look at Figure C.2, we will see two parabolas being used as approx- 
imations. They both pass through P and both have the same gradient as fix) 
at P but we would surely prefer the upper one to the lower since it bends in 
the right direction. This third condition ought to distinguish between the two 
possibilities and since concavity is measured by the second derivative we will 
insist that the second derivatives of the function and of the quadratic approxi- 
mation are equal at P. If we agree to write the quadratic in the more useful way 
of y = A (x — a) 2 + B(x — a) + C, we can easily evaluate the three parameters 
as follows: 


— = 2A(x — a) + B, 

Ax 



Putting x = a in the expressions for y, Ay /Ax and Ary /Ax 2 and imposing our 
conditions then gives C = f (a), B — f'(a), A = ^ f"{a ) and the approxima- 
tion as 


fix ) « f{a) + (x - a)f'(a ) + \{x - a) 2 f"(a). 


In general, we can continue the process by approximating by a cubic, quartic, 
etc., insisting that each higher derivative at P is equal to that derivative of the 
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function at P to get 

fix) « f(a) + (x - a)f'{a) + (A ^ ^ ^ f"{a) H , 

noting that the denominators are factorials because of the repeated bringing 
down of the powers in the differentiation process. 

C.3 Examples 


A 3 

1 ) (a — 2) — + • • • , 


are easily computed, taking a — 0 and with this value of a the name Taylor 
is often replaced by the name Maclaurin. An important case where we cannot 
approximate taking a = 0 is with the function In a, since it simply is not defined 
there. Rather than take another value for a, it is more convenient to shift the 
function sideways by 1 to get ln(l + a) & x — jx 2 + |x 3 — ^x 4 + • • • . 

C.4 Convergence 

It is clear that, provided the function is infinitely differentiable, the Taylor pro- 
cess can be continued indefinitely (although even then there can be problems, 
as we will mention later) to give an infinite series rather than a polynomial and 
although it is designed to approximate at a point we would expect a decent 
approximation in some neighbourhood of the point; just how big that neigh- 
bourhood is is determined by the size of the error term involved for any degree 
of approximation and in particular by its asymptotic size. We will not consider 
this and therefore avoid Taylor’s Theorem, but amazingly for a number of the 
important functions, the error term is asymptotically zero for all a and so the 
infinite series equals the function. Putting a = — 1 in the first example above 
results in 1/(1 + a) ^ 1 — a + a 2 — a 3 + • • • , which we know from the theory 
of geometric series is exact for \x\ < 1 and so approximating 1/(1 + a) at 
the point (0, 1) results in an exact alternative of 1 — a + a 2 — a 3 + • • • in 
| a | < 1 . The news is better still with, for example, e x and sin a since the infinite 
series equal the function for all a, in fact, the series can be used to define such 
functions — and of course the series can make sense with x € R replaced by 
zeC. 


(1 + JC)° 


1 + ax + a (a — 1 )—+ a (a 


2 3 

, X Z X J 

e x & 1 + a -I 1 

2! 3! 

3 5 

X X 

Sin A ~ A 1 

3! 5! 
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APPENDIX D 


Complex Function Theory 


D.l Complex Differentiation 


With a real-valued function of a real variable, the standard definition of the 
derivative is 


/'(*) 


lim 

<5.v->0 


fix + Sx) - f(x) 
8x 


given that the limit exists. It was Cauchy who provided this rigorous definition, 
which has great geometric appeal, as a variable chord ever more accurately 
approximates a given tangent; zooming in as the chord shortens forces the eye 
to accept that the function, the chord and the tangent are all blending into one 
another, making it that bit easier to believe that the final limit is indeed the 
gradient of the tangent to the curve at the point and, in fact, defines that tangent. 
It is in the direction in which Sx -> 0 that the greatest subtlety lies, as the 
definition of derivative relies on that limit being the same no matter from which 
direction 8x -> 0; fix) — |x| is not differentiable at the origin because of this. 
If we replace x e R. by z e C, we can formally write 


f'iz) 


lim 


fiz + 8z)-fjz) 
8z 


The comfortable geometric interpretation has deserted us, leaving a gap filled 
only by cold analysis and, as with the real case, the formula is taken as the 
definition of the derivative of the function at the point s. Again, if we think 
carefully about the 8z -> 0, we now have an infinite number of directions from 
which to choose, rather than just the two, and if we insist that the limit does 
not depend on the direction, we are surely asking a great deal more than in the 
real case; and so it turns out. If we recall that C includes R, there are three 
cases to consider, the first two of which we can dispose of quickly — but not the 
third. 
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D. 1 . 1 A real-valued function of a complex variable 


As an example, consider the simple function f(z) — x, where z = x + iy. If 
we approach the limit along the real axis, we get Sz = Sx and 

r, ( v r f(.z + 8z)- f(z) . x + 8x-x 
f (z) = lim = lim = 1, 

Sz- >o <5j Sx-> o Sx 

whereas, along the imaginary axis, Sz = iSy and 


t , t , r f(z + Sz)-f(z) x-x 

f (z) = lim = lim — — = 0. 

<5z^0 Sz iSy- >0 iSy 


So, this seemingly most simple function has no derivative. If we look at things 
more closely, we can identify the root of the problem: approaching the limit 
along real values must mean that if the limit exists it is real, whereas, approach- 
ing it along imaginary values must mean that if it exists it is imaginary, since 
the denominator is imaginary and the numerator is real. The only possible rec- 
onciliation is if the imaginary limit is 0, in which case, if the function is to be 
differentiable, the real limit must be 0 also. In summary, if such a function is 
differentiable, its derivative must be identically 0. 


D.1.2 A complex-valued function of a real variable 


If we write f(x) — u(x) + iv(x), then 


fix) 


(u(x + Sx) + iv(x + Sx)) — (m(x) + iv(x)) 

lim 

Sx — >-0 Sx 

u{x + Sx) — u(x) + iv(x + Sx) — iv(x) 

lim 

Sx — Sx 

u(x + (5x) — u(x) v(x + Sx) — v(x) 

lim b i lim 

Sx Sx^> 0 Sx 

3 u . dv 

h i — 

dx dx 


provided that the derivatives exist. The matter is therefore reduced to the two 
real cases. 


D.1.3 A complex-valued function of a complex variable 

We can write that if z — x + iy , then f(z) = u(x, y) + iv{x, y). This third 
and final case has deep-lying consequences and lives at the heart of complex 
calculus — and it has its surprises. First of all, a name. Any such function, which 
has a meaningful derivative wherever it is defined in a region, is called analytic 
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in that region (another term used is holomorphic ) and if the region happens to 
be the whole of the complex plane it is said to be entire. 

Suppose that we once again approach 0 along the real axis and then along 
the imaginary axis: 


f'(z) = lim 

8 x — ^0 


= lim 

8 x — >0 


f(z + Sx) - f(z ) 

Sx 

u(x + Sx, y) — u(x, y) v(x + Sx, v) — v(x, y) 

— 1 - i 


Sx 


Sx 


8 u . dv 
— — + i — 

dx dx 


and 


f'(z) = lim 

i8y—> 0 

= lim 

<5.r-s-0 


f(z + iSy ) - f(z) 
iSy 

u(x, y + Sy) — u(x, y) v(x, y + Sy) — v{x, v) 
— ^ h i ' 


iSy 


iSy 


. du dv 
Sy + Sy’ 


and for these to be the same we must have that 

dll dv dll dv 

Sx Sy Sy Sx 


This is, of course, simply a necessary condition for the derivative to be properly 
defined. It turns out not quite to be sufficient, for that we need all four partial 
derivatives to be continuous as well. These are called the Cauchy-Riemann 
equations, and using them we have four equivalent ways of writing the derivative 
of an analytic function; in particular. 


f'Cz) = 


dz 


8 u . dv 
— + i — • 

dx dx 


It is not difficult to see that the standard rules of differentiation carry across 
to the complex case — linearity, product rule, quotient rule and chain rule — as 
do a number of reasonable general results, in particular, if f(z) — z n , then 
f'(z) — nz n ~ l for n e R. More general results can carry across too, for 
example: 

if f'(z) — 0 for all z, then f(z) = c, provided the domain is connected. 


The qualification that the domain should be connected is needed even in the 
real case, since if 


m = r’ 


X < 1, 
x > 2, 
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its derivative is clearly zero; the analogous complex case is 


f(z) = 



\z\ < 1, 

kl >2, 


and again, clearly, f'(z) = 0. 

Now suppose that the domain is connected. 

If f(z) = 0, 

du dl) dl) du 

— -(- i — = — — i — = 0, 
dx 3.r 3y 3y 

which of course means that 

du dv dv du 

dx dx 3 y dy 

Since du/dx — 0, u(x, y) is constant along horizontal line segments; similarly, 
since du/dy — 0, u{x, y) is constant along vertical line segments. The same 
argument holds for v(x, y). Therefore, f(z) = u(x, y) + iv(x, y) is constant 
along each horizontal and vertical line segment in the domain. But the domain 
is connected and so any two points, z l, Z2> in it can be joined by a series of 
horizontal and vertical line segments, which lie entirely in the domain and the 
function is constant along all of them, consequently f(zi) = f(zi)- Since z\ 
and Z 2 are arbitrary, f(z) must be constant in the whole domain. 

As a second reasonable general result we have that if \f(z)\ — c, then f(z) 
is constant. To establish this, use the definition of | ■ | to get |/(z)| = c 
u 2 + v 2 = c 2 . Partial differentiation with respect to x and then y gives 

du dv du dv 

2 u h 2v — = 0 and 2 it h 2v — = 0. 

3x dx dy dy 

Cancelling the 2 and using the Cauchy-Riemann equations gives 


dll dll dll du 

u v — = 0 and u hr — = 0. 

dx dy dy dx 

Treat these as two equations in two unknowns to get 


o 9 3 ll 9 du 

(u 2 + v 2 )— — c 2 — = 0, 
dx dx 


so either c = 0 and f(z) — 0 identically or du/dx — 0. Similarly, 

du dv dv 
dy dx dy 

Therefore, f'(z) — 0 and from above f(z) is constant. Actually, the result holds 
if we have Re f(z) = c or Im f(z) = c. 
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It is hardly surprising that the function /(z) — z is differentiable — we have 
u = x and v = y, making 

du dv du dv 

— = — = 1 and — = =0 

dx dy 3v dx 


— but hardly believable that /(z) = z is not (here, u = x and v — —y, which 
cause the first Cauchy-Riemann equation to fail): intuition has no place in the 
study of the behaviour of complex functions ! 

If we use the ideas of Taylor expansions to extend the standard elementary 
functions we can formally give meaning to 


z 3 z 5 

sinz = z— — + — — ••• 

z 2 z 4 

cos z=l 1 • • , 

2! 4! 

z 2 z 3 

e~= 1 +J +- + - + - 


and others like them, all of which can be shown to converge for all z e C. 
Notice that term-by-term differentiation yields the expected results 


d 

— sin z = cos z, 
dz 

Furthermore, we have that 


d 

— cosz = — sinz, 

dz 



e lz = cos z + i sinz, 

0 ^ — 


sinz = 


sinh z = 


e- 


—i sin iz. 


21 - 

cosh z = 


cos z = 
e z + e~ z 


e lz + e 


cos iz, 


etc. 


All of these (and many more such expressions) are no more than their real 
counterparts with z replacing x, which begins to bring about a cosy familiarity — 
soon broken by the equation cos z = 2 having solutions. That it does must mean 
that 


+ e 


= 2 , 

4± V16-4 


e ,z + e n = 4, 


4 ± v/l2 


e 2iz + 1 = 4e iz , 
= 2± n/3. 


If we allow the usual taking of logs, we are forced to write iz = ln(z ± V3) and 
Z = —i ln(l±y3), which gives a solution of z = —i In (z+\/3), uncomfortable 
because cosz = 2 has a solution at all, and less comfortable still because 
Z = —i ln(l — \/3) is another and we recall the ‘fact’ that we cannot have the 
log of a negative number. This ‘sophistry’ will be explained later. 
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D.2 Weierstrass Function 

With our current knowledge of fractals, the idea of a real function existing which 
is everywhere continuous but nowhere differentiable is not novel but back in 
1861 none was known, although their existence was suspected and in particular 
by Riemann, who suggested the idea to some of his students and even provided 
a candidate — but no proof. It took until 1872 when Weierstrass provided his 
own function to do the job — one of those events that helped to force more rigour 
into mathematics. In fact, he proved that for b an odd integer greater than 1 and 
for 0 < a < 1, then if ab > 1 + |;r, the function f(x) = a' cos (b r x) is 

indeed everywhere continuous but nowhere differentiable; Hardy later extended 
the result to ab ^ 1 (see Figure D.l). 

In the complex case we do not have to look nearly so hard to find such a 
monster, as the simple modulus function will do the job. We have seen that 
in the real case, the function causes a difficulty in that it is not differentiable 
at the origin, although it is obviously everywhere continuous; in the complex 
case, matters are much worse: f(z ) = |z| is a continuous, real-valued function 
of a complex variable and we have seen that if its derivative exists, it must be 
0, which seems suspicious. In fact, its derivative exists nowhere and we can 
prove that using the Cauchy-Riemann equations since u(x, y ) = J x 2 + y 2 
and v(x,y) — 0, therefore 

du 2x d u 2 y 

dx j x 1 + y 2 dy J x 2 + y 2 

and, as long as not both x and y are 0, the Cauchy-Riemann equations are 
clearly not satisfied; the case x = y — 0 gives the indeterminate form 0/0 and 


dv dv 
dx dy 
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we have to go back to first principles to get 


0 +, 

o-, 

just as with f(x) = |x|. The same argument shows that 

du 

dy (0.0) 

does not exist. 

The function f(z.) — |z| is clearly complicated, but it is not as bad as its 
companion f(z) = arg z, which is not even properly defined, as it is only 
determined up to integer multiples of 2n\ when it is restricted to [— tt, jt], it is 
usually written with a capital ‘A’, and then f(z) — Arg z = tan 1 (y/x). Once 
again, it is a real-valued function of a complex variable and so we know that if its 
derivative exists anywhere, it must be zero and if we apply the Cauchy-Riemann 
equations once more we get 


3 u u(h, 0) — m(0, 0) 

— = lim 

dx (0 o) />-► o h 

,. Vh 2 ,. \h\ 

— lim = lim — 

- h h^Q h 


1 , h 
-1, h 


du 

dx 


u(x, y) — tan 

— y du 

x 2 + y 2 ’ 3 y 



v(x, y) = 0, 



dv dv 
dx dy 


and once again we have that, as long as not both x and y are 0, the Cauchy- 
Riemann equations are not satisfied; since the function is not defined at x = 
y = 0, it is nowhere differentiable. 


D.3 Complex Logarithms 

We can define the complex logarithm by its formal power series 

ln(l + z) — z - \z 2 + yZ 3 , |z| < 1, 

and, just as in the real case, 

In = 2 ( z+ 3 z3 + 5 z5 H )’ Izl < !> 

if we want to reach complex numbers outside the unit circle, but that disguises 
the important subtlety brought about by the ambiguous nature of the argument 
of a complex number. If we take the definition of the logarithm as the inverse 
of the exponential function, the matter is much more clear. So, write w — In z 
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if z = e w . If w — u + iv and z — r (cos 6 + i sin#), we have that z = 
e w = e u+iv = e lt e lv = e u (cosv + i sinu) = r(cos0 + i sin#), giving two 
expressions for z and, in particular, \z\ to give e u — r and hence u = In r, a 
genuine real logarithm. Also, cos v + i sin v = cos 6 + i sin 0 , which means 
that cos v — cos # and sin v = sin 6 must both be satisfied and sow = 6 + 2nn, 
where n e Z. All of this means that In z = In r + i (0 + 2 rm) is a multivalued 
function. Restricting to the principal arg function Arg makes n = 0 and the 
principal logarithm function is written lnz = In r + iO for — n ^ 0 ^ n, or 
lnz = In |z| + i Arg z. In the series above, the lowercase ‘1’ should be replaced 
by its capital. The earlier solution z — —i ln( 1 — V3) of cos z = 2 is then 
z = i (In 2 + i (— \n)) — + i In 2. 

Now we can differentiate In z in the usual way: 


In z = In 


tx 2 + y 2 + i tan 1 



so u(x, y) — ^ ln(.r 2 + y 2 ), v(x, y) — tan l (y/x) and 


du 

X 

du 

y 

dx 

- x 2 + y 2 ’ 

dy 

x 2 + y 2 

dv 

~- v 

dv 


dx 

X 2 + V 2 ’ 

dy 

x 2 + y 2 


The Cauchy-Riemann equations are satisfied and 

d x y 1 

“j — lnz — — 9 9 — l — 9 K — , 

dz x- + y z x z + y z z 

as we might have hoped. 

The mixture of surprise and familiarity is an inevitable part of the demand- 
ing definition of complex differentiability and it would be reasonable to think 
that, with its lesser demand of continuity, complex integration would be more 
predictable in its behaviour — but once again intuition fails us. 


D.4 Complex Integration 
D.4.1 The definite integral 

Before we can properly discuss complex integration we need to understand the 
topological idea of a region being ‘simply connected’, which really means that 
it has no holes. Put more mathematically, we will say that a region is simply 
connected if any closed curve drawn in it can be continuously deformed to any 
other closed curve in it, without leaving the region; we have already met this 
idea on p. 227. Geometrically, see Figure D.2. 
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Disc 



Annulus 


Figure D.2. 



Figure D.3. 


Clearly, any two closed curves drawn inside the disc can be continuously 
deformed into one another without leaving the disc, but the two drawn in the 
annulus cannot be. Another way of saying the same thing is that in a simply 
connected region, while staying inside the region, any closed curve can be 
shrunk to a point. Two other definitions will also be useful to us: a curve is said 
to be simple if it does not touch itself or self-intersect and it is said to be smooth 
if it has a well-defined tangent at every point. Now to the theory. 

As with differentiation, the definition of the complex definite integral relies 
heavily on its real counterpart and so it is sensible if we look at that first. 
Suppose that fix) is a continuous real- valued function of a real variable, 
defined for a ^ x ^ b, divide the interval [a, b] by introducing the points 
a = xo, xi, xo, . . . , x„ — b. 

Then we define 

n n 

Sn = f^)lXr ~ X r -l) = fGr)*X r , 
r= 1 r=l 

where is any point in the interval [x, — i , x r ], as the sum of areas of rectangles 
approximating the area under the curve; S„ —>■ fix) dx in the limit as n —>■ 
oo. 

Now suppose that we have a smooth curve C defined in the complex plane, 
which starts at the point a and continues to the point b , and is divided by points 
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Figure D.4. 


a = zo, Zi, Z 2 , - . . , Z n — b. Introduce the interior points to get Figure D.4 
and 

n n 

Sn = ~ Zr- 1) = J2 f&)8z r . 

r= 1 r=l 

Now take the limit as n — >■ oo to get 


S„ f(z)dz. 

The geometric interpretation of areas of rectangles ever better approximating 
the area under a curve is lost, but we have a formal and natural extension of the 
idea. 

If we represent C in the parametrized form z(t) = x(t) + iy(t), where 
z(a) = a and z((i) = b and rewrite the expression in a slightly different way, 
we get 


v — > x ^ z(i + <5r) — z(t ) 

^2 f(z(t))(z(t + 8t ) - z(r)) = ^2 f(z(t)) 7 —St 


St- s-0 


l 


8t 

P d z(r) 

/(z(O)-^d/, 

dr 


which makes it clear why the curve needs to be smooth. In short, we have 

f(z) d z= f 

J a 


? P dz 
f(z)~dt. 


dr 

The standard rules of linearity are inherited from the ^ to give 


/l (z) + fi{z) dz = f c /i fe ) dz + ^ h (z) 

and 

<j) Sf(z) dz = £ j) f(z) dz for feC; 
for the same reason, if C is made up from two smooth curves, C\ and C 2 , 

/(z) dz = (f f (z) dz + (£> f (z) dz 


Ci 


c 2 
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and further that 


f(z)dz. = ~ l f(z)dz, 
c Jc 


where the arrows indicate the direction in which C is traversed. 


D.5 A Useful Inequality 

Suppose that \f(z)\ ^ M for all z e C and that C has length L , then 


\Sn\ = 


f($ r )SZr < \f^r)\\SZr\ < 


r= 1 r — 1 r= 1 

Since |Sz r | is the length of the chord joining z r and z r -i, as n -> oo, 


£l«Zrl -> L, 


r= 1 


by definition of the length of a curve, and we have the result that 


f(z.) dz 


s£ ML. 


D.6 The Indefinite Integral 

With real-valued functions of a real variable, integration is, of course, the process 
of finding the (signed) area under the graph of a function, but is also the process 
of anti-differentiation and the two are linked by the Fundamental Theorem of 
Analysis, which states that 


f 


f{x)dx = [F{x)f a = F{b)-F{a), 


where F(x) is defined as any function such that dF(x)/dx = f(x), in which 
case, F(x) is called the indefinite integral’ of f(x) and is written F{x) = 
f f(x) dx. 

With this result in place, we know that finding the area under a curve becomes 
a matter of anti-differentiation; for example, 


L 


x d.v 


.v ' 
~2 


Jo 


12 _02 
2 2 


It would be nice if we could do the same in the complex case to get, for example, 


L 


1+J 


zdz — 


2 n 1+/ 


Jo 


(1 + if 0 Z . 

2 2 ~ ' 
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In the real case, there is never any choice about how the upper limit is approached 
from the lower, the crucial point here is that the result would have no regard 
for the infinite number of paths that could be taken to get from 0 to 1 + i . If we 
have a Fundamental Theorem of Calculus in the complex case, the path has to 
be irrelevant, which seems an overly optimistic hope — but consider the trivial 
function f(z) — 1 integrated over any path connecting the points a and b: 


1 dz = lim ((zi - a)l + (z 2 - zt)l + (Z3 - Z 2 )H h (z« - z n - t)D 

n—>oo 


— lim (z n — zo) = b — a. 

n— >oo 


The path is indeed irrelevant and we could write 




1 dz = [z] b a = b 


A promising start, but things soon go wrong. 

For example, if /(z) = Re(z) = x and we integrate from a = 0 to b = x + iy 
along Ci and C 2 as shown in Figure D.5 we get 


C 1 : z(r) — xt + iyt, 0 ^ t ^ 1, 


to give z(f) — x + iy and 


/ Re(z) dz — I xt(x + iy) dr = \x(x + iy) = \x 2 + i \xy; 

Jc Jo 

C2 ■ zi(t) = t, 0 ^ t ^ x, zi(t) — x + it, O^r^ y, 
and so zi (?) = 1 and zjit) — i to give 

[ Re(z)dz= f r.ldr+ [ xi dt = lx 2 + ixy, 

Jc 2 Jo Jo 

which is hardly the same answer! 
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Now consider, f (z) = 1 /z, which we can consider in two separate ways: it 
is defined and analytic in any annular region centred on the origin, or it is not 
defined at the origin and therefore not analytic in any disc centred on the origin. 
Suppose that we allow the annular region and the disc to contain the unit circle 
C, defined by |z| = 1, then we have 

z(t) — cos t + i sinf, z(t) = — sin t + i cos t 


and 


1 

(— sin t + i cos t) dr 

cos t + i sin t 


(cos t + i sin t ) dr = 2m, 

cos r + i sin t 


perhaps not at first surprising, but this is closed contour and if the answer simply 
depended on the end points, it should be zero. 



D.7 The Seminal Result 

We will not prove the result, but the reconciliation is found in a consequence 
of Cauchy’s Integral Theorem: 


If f(z) is analytic inside a simply connected domain A, then 
j) f(z) dz is constant for any contour C lying inside A. 


From this is follows that, if the contour joins a and b, 

£ f(z) dz = j" f(z ) dz = F(b) - F (a), 

where F(z) = f" /(£) d£ is the indefinite integral. 

Notice also that this implies that if C is a closed contour, f c f(z) dz = 0 and 
the example with /(z) = 1/z above demonstrates that the analytic and simply 
connected conditions are both necessary. 

From this we can see that 


/(z) dt: = 


/(z)dz 


Jc Jc i 

if Ci is any path continuously deformed from C, with its ends fixed; this is 
called the Principle of Deformation of Path. 
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D.8 An Astonishing Consequence 


Cauchy’s Integral Formula 

If f(z) is analytic inside a simply connected domain A, then 
for any point z e A and any simple closed contour C e A, 

f(z) =^~f ^-df. 

2ju Jc $ - z 


This means that at every point of the domain the function is determined by its 
values on any simple, closed contour in the domain and enclosing the point; a 
result that has a most peculiar feel to it. To prove it, draw a circle C p of radius 
p around z, then by the deformation of path principle (see Figure D.6), 


/(f) 


c f 


df = 


/(f) 


c p f 


df. 


We now evaluate the right-hand side of the above expression: 


/(f) 


c p f 


df = 


Co 


/(f) - m 

f 

/(f) - /(z) 


df + 


m 


Co f 


df 


df + /(z) 


’c p f - z 
By a simple translation, we know from before that 

1 


c P i-z 


df. 


Cp f 


d£ = 2ni. 


Now note that (/(f) — /(z))/( f — z) is bounded for all £ ^ z inside and on 
C and that 


lim 

f-»-z 


/(f) - m 
f - 2 


/'(Z), 
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which is finite since f(z) is analytic. Therefore, 


m - m 

%-z 


< M for all f inside and on C. 


Consequently, 



m - m 

S-z 


df 



m - m 

K-z 


d? 


< 



M d£ = 2 npM 


> 0. 


So 

<f y^~ ^ = 2: rif(z) and f(z) = <f y^- Ay 
Jc Z - Z 2ni Jc $ — z 

as required. 

In a sense, this means that any analytic function f(z) can be expressed 
in terms of a simple reciprocal function 1 /(£ — z), which has far-reaching 
implications. For example, an analytic function has derivatives of all orders. 

Again, this contrasts starkly with the real case, in which the differentiability 
of a function can easily come to an end; for example, f(x) — x\x \ differentiates 
to f'{x) = 2\x\. 

The proof is trivial, if we allow repeated differentiation under the integral 
sign (which is not hard to justify). Pick any closed contour C in which f{z) is 
analytic and write 


/(z) = 


1 

2zti 


/(?) 

cK-z 


dC 


to give 


1 


f(z) = — 


/(?) 


f”(z) = — 


2ni Jc (f - z) 2 

/(?) 


1 


2ni Jc (f - z ) 3 


dC 


Ay etc. 


With this result, we can develop a part of the theory of expansions of analytic 
functions. 


D.9 Taylor Expansions— and an Important Consequence 


If we define the infinitely differentiable real function. 


.fix) = 


9-1/* 


0, 


x £ 0, 

x — 0, 
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and evaluate /( 0), /'( 0), /"( 0), ... by taking the limit each term will be 0, 
with the exponential components dominating the powers of x. As a result, even 
though the function is infinitely differentiable, it is impossible to represent it as 
a Taylor series centred on x = 0 but once again in the complex case, the severe 
restriction of analyticity brings with it a stronger result. In this short section we 
establish the Taylor expansion and several times build on results to arrive at a 
result of huge significance. 

Assume that f(z) is analytic inside and on a circle C, centred at z = a; let z. 
lie inside the circle and £ on it (see Figure D.7). Since £ — z = (£ — a) — (z — a). 


1 


1 


(C - a) - (z - a) 

1 1 

C — a 1 — (z — a)/(£ - a) 

I 


1 


£ - a 


1 - 


Z — a 

S-a 


Clearly, |z — a\ < |£ — a\ and so |(z — fl)/(£ — a)| < 1 and the infinite binomial 
expansion is valid to give 


1 


1 


C - Z £ - a 


1 + 


z — a 
£ — ci 


+ 


z — a 
£ - a 


Z — a 


+ 


z — 


and 


f(z) = — 


/(C) 


2iti Jc C ~ a 


Z — a / z — a 
+ 


f - a 
+ 


£ - a 
■«' 3 


a 


+ 


dC, 
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1 


/(z) = — 


no 


2ni Jc t, — a 

(z - a ) 2 


d? 


(z ~ a) 

2ni 


+ 


2 71 i 


no 

c (£ ~ a ) 


dr 


no ,, 

c 0 - a) 2 c 

(z - a)" 


+ 


no 


2 ni Jc X ~ a) 


n -\- 1 


d? 


f(a) + (z- a) f'(a) + ^ f” (a) 

(z - fl ) 3 , , (z — a)" , (B) 


3! 




/ w (a). 


= ^A r (z-a) r 


)=0 


where 




1 

2n i 


f m +l 

Jc X -a) r+l 


dr 


Again, the term-by-term integration can be easily justified and we have a guar- 
anteed (and unique) convergent Taylor expansion of the function in the disc. 
The formal series definitions of some of the standard functions we mentioned 
earlier can be made rigorous in this way. 

Combine this with the ‘ML’ result on p. 235 and we see that the coefficients 
A r satisfy the inequality 


\A r \ = 


f(0 


2 ni Jc (X - fl) r+1 


df 


1 M M 

^ 2 ^yn 27T P = Jr' 


where |/(C)| ^ M on C, which has radius p. 

The very reasonable earlier result that |/(z)| = c =>• f(z) = k extends to a 
very surprising one that again simply is not true in the real case; the function 
fix) — 1/(1 + x 2 ), for example, is infinitely differentiable and bounded by 
1, but it certainly isn’t constant, whereas in the complex case we have that a 
bounded entire function is constant (this is Liouville’s Theorem), which is now 
easy to prove. 

Since the function is entire, we can expand it as a Taylor series about 0 to 
get f(z) = A r z r - Since f(z) is bounded in C, |/(z)| < M on any disc 
centred on 0 and of radius p and so \A r \ ^ M / p r for r X 1 • Since we may take 
p arbitrarily small, \A r \ = A r = 0 for r 1 and f(z) = Ao, a constant. 

And having proved that it is but a small step to one of the cornerstones of the 
whole of mathematics: 

The Fundamental Theorem of Algebra 
Any polynomial with coefficients in C has a root in C. 
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Write the polynomial as P(z) — ciq + fifis + <^2Z 2 + • • • + a n z n ■ If P(z) has 
no roots, f(z) — 1 /P(z) is an entire function and for |z| sufficiently big (say, 
|z| > R ), |P(z)| > 1 and so |/(z)| < 1. In the disc |z| ^ R, |/(z)| is clearly 
continuous and it is a standard topological result that it is therefore bounded, 
consequently, /(z) is bounded in the whole of C and using Liouville’s Theorem 
it is constant. Having established one root in C we can reduce the degree of the 
polynomial by 1 by factorization and repeat the process to get the result that 
any polynomial of degree n with coefficients in C has precisely n roots in C. 

Seeing this result so neatly and easily proved belies the difficulty that was 
encountered initially to establish it, a task not made the easier by the mathemati- 
cians who attempted it having the deepest suspicions about complex numbers. In 
one of the most significant PhD theses ever. Gauss gave a first satisfactory proof 
of the result in 1799, albeit for real coefficients, following incomplete attempts 
by Descartes, Euler, d’Alembert and Lagrange; in fact, over the course of his 
lifetime he produced four different proofs, the last one finally dealing with the 
case of complex coefficients. 


D. 10 Laurent Expansions— and Another Important Consequence 


Taylor expansion crucially needed analyticity in a simply connected region, but 
suppose that the region was not simply connected or that the function was not 
everywhere analytic? For more than 20 years from 1821 Cauchy had developed 
complex function theory virtually alone, until at last some of his fellow coun- 
tryman began to mine the many rich ideas that he had exposed and in 1843 
Pierre-Alphonse Laurent (1813-1854) answered this question by extending the 
idea of Taylor series to what has become appropriately known as Laurent series 
(Weierstrass had known about this in 1841 but had failed to publish his find- 
ings). As with the function /(z) = 1/z, the result can either be looked at as a 
series expansion of an analytic function in a region comprising a disc with a 
hole in it or of a function defined on a disc but having an isolated singularity — in 
which case we can ‘cut it out’ by surrounding it with a removable circle (see 
Figure D.8). 

Suppose that the singular point zo is surrounded by an inner circle C p and 
that we perform a radial cut from C to C p , thereby constructing a contour which 
takes us all around C, radially inwards to C p , all around that (in the opposite 
direction) and back along the radial line and then along to the start on C. This 
results in a simply connected region in which /(z) is analytic and so we can 
apply Cauchy’s integral formula to get 


1 


m = ^ 


2 ni Jc f - z 


/( » « - A 


/(C) 


2 ni Jc £ — z 


dC, 


the two equal and opposite contributions from the radial parts having cancelled 
out. 
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For £ e C, |z — zol < | ^ — So I and the same reasoning as was given for the 
Taylor expansion gives 


where 


1 

2 ni 


/(£) 


c £ 


d£ 


OO 

^2 a r (z - zoY , 

r = 0 


fl r 


1 

2jU 


f m +1 

Jc (x - zo) r+l 


d£. 


The problem with the second integral is that f e C p and so |z — zo I > | ^ — Zo I 
and the geometric series that was developed will diverge — so we turn things 
upside down, since we can also write 


£ - z = (£ - zo) - (z - zo), 


and therefore 

1 


1 


£ - z (£ - zo) - (z - zo) 

1 -1 


z — zo 1 - (£ - zo)/(z - zo) 


-1 


z - zo 

-1 

Z - zo 


1 - 


£ - zo 
z - Zo 


1 _|_ % - zo ( £ - zo 


z - zo 


+ 


z - zo 
£ - zo 
z - zo 


+ • • • ■ 


£ - zo 
z - zo 


+ 
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2m JcA ~z 


1 r f( ^ (\ + j ~ Z0 + ft-zo 


2ni Jc p z - zo V z - zo V z - zo 


1 

2 ni 


z — z o Jc, 


7(0 df + 


^y + -. + (^y + - 

Z-ZoJ V z - zo / 

(f-zo)/(?)df ■ 


df 


(z - zo) 2 Jc, 


+ 


1 


(z - zo)" Jc, 


(?-zo)""7(Odf + 


= E 

r=l 


(z - zo) r 


where 


br = ^-(p (f-zoZ-V^df. 

2tt/ 7 c 0 


All of this makes 


/(z) = E (z - Zo)r + E 773 


f=0 


r=l 


(z - zo) r 


the promised Laurent series of the function. 

It is important to note that, just as the Taylor expansion for a given function 
is unique in its disc of convergence, so the Laurent expansion is unique in its 
annulus of convergence, although it can vary over concentric annuli. There are 
any number of examples of this phenomenon, for example, 


1 

z(l + z) 


1 

z(l + z) 


-(T - z + z 2 - z 3 H ) 

z 


1 9 

— 1 + z-z , 

z 

1 1 / 

z 2 ( 1 + 1/z) z 2 \ 

1 1 1 

9 1 A 5 


0 < Izl < 1, 

1 J__ J_ \ 

Z + Z 2 Z 3 ’ ' 7 

1 < Izl < 2, 


where the right-hand boundary of 2 is arbitrary. Laurent series have their uses, 
just as Taylor series have their uses, but in pole position among them is their 
application to the calculation of what are known as residues and through that 
to the evaluation of real and complex definite integrals. 
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D. 1 1 The Calculus of Residues 

Consider a function f(z.) defined and analytic in a domain A apart from a 
singularity at z = Zo (called a pole, which explains the earlier pun). Construct a 
circle with zo as its centre, then if C is any closed contour in A surrounding that 
circle, f(z) has a Laurent expansion as above in the annulus and the coefficient 
of the first negative power term is 

= /(C) df 

2 TTl J C 

and so 

£ /(C) df = 2nib\\ 

consequently, if we can find the value of b \ , we can evaluate the integral, ft 
is customary to call b\ the residue of f(z ) at z = Zo and write it as = 
ReSj = , 0 /(z) and so we have that 


<p /(C) dc = 2ni Res /(z). 

Jc z = zo 

By constructing circles around each singularity individually, the idea easily 
generalizes to n singularities to give 


& /(C) dC = 2 Tti Y] Res /(z), 

Jc “ *=*' 


which is known as Cauchy’s Residue Theorem. Now all we need are methods to 
calculate the residues, of which there are many, and we will be able to evaluate 
the integral. 

We will assume that we have a simple pole, that is, one for which the Laurent 
expansion has just one negative-power term and look at two related methods. 

1 . The Laurent series is 

b i ^ 

f(z) — h flo + «1 (z - zo) + 02 (z - Zo)' H ■ 

Z - zo 

Multiplying both sides by (z — zo) gives 

(z - zo)f(z) = b\ + (z- zo)(flo + ai(z - zo) + ci 2 (z - zo ) 2 H }• 


And so 


Res /(z) = b\ = lim (z - zo)f(z). 

Z=ZQ Z^ZO 


As an example, if f(z) = sin z/(z 2 + 1), 


sin z sin i i 
Res / (z) = hm (z - i) = — — = ,sinh 1 

z=i z-+i Z z + 1 2 1 
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and 


Res f(z) = lim (z + i) 


sinz sin(— i) 


= \ sinh 1 . 


If we integrate around a contour that does not contain z = ±z, the function 
is analytic and the integral must therefore be 0, but if we integrate around 
C = {z : |z| = 2}, 


f sin z I i 

/ dz = 2jri(sSinh 1 + isinh 1) = (2ic sinh l)i. 

Jc z + 1 


2. In the first example, the denominator of the fraction was easily factorized; 
suppose now that we have a rational function of z in which this is not the case. 
Write /(z) = p(z)/q(z), where p(z) and q(z) are analytic. Suppose that /(z) 
has a simple pole at z = zo, so that p(zo) ^ 0 and z o is a simple zero of q(z). 
Expand q (z) as a Taylor series about z = zo to give 


q(z) = q(zo) + (z - zo)q'(zo) + — j^~q"(zo) ■ 
= (z - zo)q'(zo) + — zr^~q"(zo) H — 


= (z - zo) 


2 ! 

/ , . . (z zo) n / , 
q (zo) H ^ — q (zo) 


So, 


Res /(z) = b\ = lim (z - zo)f(z) 

z=zo z->-zo 


= lim (z - Zo) 

Z-S-ZO 


p(z) 

q(z) 


— lim X^xzq) 

Z-S-ZO 


p(z) 

IzxcQ{<7'(zo) + ((Z - Zo)/2[)q"(zo) + • • • } 


p(zo) 

q'(zo)' 


For example, if /(z) = (z 2 + l)/sinz, 


Res / (z) = 
z=o 


0 2 + 1 
cos 0 


= 1 


and, more generally. 


Res /(z) 

z=kn 


(kit) 2 + 1 
cos kit 


| (^7r) 2 +l. A: even, 
— {(kit) 2 + 1), A: odd. 
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D . 1 2 Analytic Continuation 

Recall that the result is as follows. 

If, in some complex domain A, two analytic functions are defined 
and are equal at all points on a curve C lying inside A, they are 
equal throughout A. 

We can now prove it as follows. 

Let the two analytic functions be f\ (z) and fiiz) defined in some region A 
of C and write their difference as q>(z) — f\{z) — fiiz)- Then (p(z ) is analytic 
throughout A and identically 0 on C. Now suppose that there is a point zo e A 
at which cp(zo) ^ 0; clearly, z o ^ C. Now extend C inside A by a curve D, 
heading towards zo, and let t, be the last point on D for which <p(z) ^ 0, then 
£ ^ zo and on the segment of the curve D beyond t;,cp(z) 0, by definition of 
f . If we differentiate <p(z) at points on the curve up to f by taking the limit along 
the curve, we must have cp(z) = q>'(z ) = cp" (z) = • ■ ■ = 0 and, in particular, 
( p(l ') = q>\i ') = = ■ • • = 0. Now expand (p(z) as a Taylor series about 

the point z = £, then all of the coefficients are 0 and so <p(z) — 0 in some circle 
centred at z = £ and consequently on some of the curve beyond £ , which is a 
contradiction and the result is established (see Figure D.9). 

In general, how a given function achieves its continuation (if indeed it has 
one) depends on the function, and there can be any number of equivalent ways, 
leading to expressions that look different but must in fact be the same. 
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Application to the Zeta Function 


E. 1 Zeta Analytically Continued 


In the first part of his paper, Riemann performed the analytic continuation of 
the Euler Zeta function 


CM = £ 

r= 1 


1 


r 


X ’ 


which we already know requires x > 1 for convergence and so the function is 
defined as in Figure E.l. If we simply replace x e R. with z e C, we have a 
continuation to 


M) = £ 

r=l 


1 

A’ 


a complex- valued function of a complex variable. We would expect the complex 
form to inherit a similar restriction and so it does, as we can see from 


E 


i 

r z 


E 

r=l 

oo 

E 

r=l 

oo 

E 

r = 1 
oo 

E 

r=l 

oo 

E 

r = 1 


,Z lnr 


1 


e (Re(z)+/Im(z))lnr 

1 


e Re(z) \nr e i Im(z) lnr 

1 


e Re(z) lnr 

1 


-Re(z) ! 


which we know converges only for Re(z) > 1. So C(z) makes sense in this 
domain; pictorially, the shaded region in Figure E.2. 
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« ► 

1 

coo 

Figure E.l. 


Im z 



0 

1 

Re z 


Figure E.2. 


Euler’s product formula remains valid for complex numbers and makes clear 
that this extended function still has no zeros; so far, this is pretty straightforward 
stuff. Now for that analytic continuation, which Riemann realized using contour 
integration. 

The complex extension 


twnz) = 



, 4-1 


d u 


of the formula we derived on p. 60, which is valid for Re(z) > 1, suggests a 
contour integral 


,(z, = ii^T d “' ReW>1 ' 

for some contour u~ . A useful choice is a path coming from — oo just below and 
parallel to the real axis, (semi)circling the origin anticlockwise and returning 
to — oo parallel and just above the real axis. 

Integrate around C i, Cj, C 3 separately, therefore putting 

u — re -711 , u = pe ‘ e , u — re Kl , 


respectively, since on C 3 we are going out to minus infinity (effectively) along 
the negative imaginary axis, making the argument 7r; on Ci we are returning. 
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c. 



making the argument — 7r; on C 2 we are traversing a circle of radius p. Then 


2ni I (z) = 


-L 


oo r z-l e ~7Tiz e ni e ~TTi 


L 


e r - 1 

00 r z—l e niz e — ni £ ni 


* pz-i e ize e -i» pie i» 
dr + / ^ dd 


/’ 


e -pe w _ 1 


e r - 1 


dr 


C^L 

*/. 


+ e‘ 


71 p z e izS i 

-n e~P eW - 1 


d e 


"00 r z - 1 


e r - 1 


So, 


= sin(:rz) 


l 


OO r z - 1 


p z rTT ize 

dr + ~zr / tz d 6. 

J 


e r -l 2 J_ n e -P‘ w - 1 


Taking each integral separately, 


f 

J —71 


JzO 


2 J_ n e ~P e “ 


■ dd 


Jze 


2 \\J_ n e ~P e - 1 


■d<9 






r 

e ize 

L 

1 

1 


&e 


pRe(z) ,-jr g -lm(z)0^ 


L 


dd 


pReU) 

r 

e ize pe w 

2 

l-n 

e -pe w _ \ pe W 

pReU) 

r 

pe w 

e izd 

2 

L 

1 

r l. 

1 

pe l9 

pReU) 

r 

pe w 

e~ Im 

2 

l — IT 

1 

«r> 

1 

P 


dd 

dO 


dd. 
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Im z 

Undefined 

/ 

0 

1 Re z 


Figure E.4. 


But 


pe w 


U 

1 

Si 

1 


1 

s 

1 


is bounded for bounded u, let us say by the constant A. 
Therefore, 


p z pit e iz9 


L 


2 J- n e-P*'" - 1 


andifRe(z) > 1, 


as p —*■ 0 and 


d 0 


Re(z) fir -Im (z)8 


< 


/ 


A 

—7 T P 


dd 


4/)Re(z)-l 

< ^ 2ne^ lm ^ = tt Ap Re «-' 


L 


oiz.9 


2 J_ x e ~ ae _ [ 


■dQ -> 0 


ttI(z) = lim sin(jrz) 
p-s-0 


l 


oo r z - 1 


dr 


= sin(:rz) 


L 


oo r z - 1 


e r - 1 

dr = sin(7rz).T(z)f (z) 


and so 


Uz) = ~ 


e r - 1 
itl(z) 


sin(7 rz)r(z) 

and since F(z).T( 1 — z) = n / sin(7rz) we have that 


ttz) = 


r( i-z) 


,z-l 


2: ri J u - e " — 1 


dw, 


which is defined and finite for all z ^ 1 . 

The domain of definition is now as in Figure E.4. 


e ^|Im(z)| 
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Figure E.5. Outer radius R — (2 N + 1)jt. 


E.2 Zeta’s Functional Relationship 


We are going to ‘trap’ I (z), evaluating it by integrating around a second contour, 
which in the limit is the same as the one above. 

Consider the contour integral 


In(z) = 


l r u z ~ l 

/ d u, 

2tti J Cn e " - 1 


where Re(z) < 0 with the contour shown in Figure E.5 for N a positive integer. 
On the outer circle we have u = Re‘ e , —n i) ^ tt, and 


u z ~ l 


(R e i0)Z-t 

e~ u - 1 


e~ u - 1 


^z-l g i0(Re(z)+f Im(z)) e — (0 
e~ u - 1 

^Re(z)- 1 Ri Im (z) £ id Re(z) e ~0 Im(z)) 
e~ u - 1 


= e -0Im(z)^Re(z)-l 
< ^Re(z)-l e jrlm(z)^ 


1 

e~ u - 1 

< R ^z) e itlm(z) A 


since 

1 

e~ u - 1 

is bounded in the region. 

So, as N, R — >■ oo the contribution to the integral from this part of the contour 
—> 0 and therefore /,y (z) -> /(z). 

The function f(u) — u z ~ l /(e~ u — 1) has poles where e~ u — 1=0 and so 
u = 2kiti for k — 1,2, N and for k = — 1 , —2, . . . , — N (which is why 
the outer radius is taken to be ( 2N + l)7r). If we are to use Cauchy’s Residue 
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Theorem to evaluate /y (z), we will need the residues at each of these poles and 
so we will use the theory of residues to find them: 


Res /(„)= Res gW = PV *"> = < 2 *«) = 
u=2kni u=2kni q{u) C[ (2kjti) —1 


So, 


= -L[ 

27 XI J c , 


,Z-1 


In(z) = — 
n 

N 


d u 


c N e~“ - 1 

= — ^^{{2kTC i) z ~ l + (—2k7ti) z ~^} 


k= 1 
N 


= -J2 (2nk) z -\e^ z - 1)i/2 + e -”(z~V‘/ 2 ) 

>t=i 

N 

— — ^2 (27T r) z_1 2 cos(7r (z — l)/2) 

r=l 

N 

= — 2(2tt) z_1 sin(7rz/2) 


r=l 


Now we recognize that we have integrated around the contour in the opposite 
direction for 7(z) — I i m y^, ^ /y(z), so we have that 


I(z) = — lim In(z) = 2(2: r) z 1 sin(7rz/2) r z 1 
N-^-oo ^ — ' 

r= 1 

= 2(27r) z-1 sin(7rz/2)£(l - z ) 

with the convergence guaranteed, as Re(l — z) = 1 — Re(z) > 1. 

Each form of I (z) was established using a different assumption about Re(z) 
but the uniqueness of analytic continuation allows this to be disregarded and, 
combining these two forms for I (z), we get 

_ 27 t (2 jt ) z ~ 1 sin(7rz/2)£(l — z) _ (2n) z sin(7rz/2)£(l - z) 
sin(7rz)r(z) sin(7rz)r(z) 

and we have the promised functional relationship f (1 — z) = X (z)C(z), where 
sin(7rz)E(z)£(z) = (2n) z sin(7rz/2)£(l — z) for all z ^ 1, which becomes 
?(1 - z) = 2(2 jt)~ z cos(7rz/2)r(z)C(z). 
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